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Abstract 

This thesis proposes a timing simulator (rsim) based on a uniquely simple transistor model. RSIM 
allows a designer to determine both the functional and approximate timing characteristics of a MOS 
network with more accuracy than gate-level simulation, and using larger circuits than are 
accommodated by circuit analysis programs. In rsim, transistors are modeled as resistors; the logic 
states of a transistor's terminal nodes determine its effective resistance. Using this model, a MOS 
network is simulated as a network of resistors where each node's value is determined by the resistance 
of its connections to various inputs. Transition times are determined from the RC time constant 
calculated for the node by examining the surrounding network; (R from the transistors, C from the 
interconnect and gate capacitance). The network's behavior as inputs are given values is calculated by 
an efficient event-driven algorithm. 

Two changes to the underlying model are also investigated: 

(1) further simplifying the transistor model to an on/off switch (which can be 
thought of as a degenerate resistor). Several approaches to switch-level 
simulation are developed, one particularly well-suited for implementation using 
parallel hardware. 

(2) modeling the behavior of a network of switches by a system of logic equations. 
Various compilation strategies are evaluated for producing code that implements 
the system of equations. 
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CHAPTER ONE 



INTRODUCTION 



Simulation plays an important role in the design of integrated circuits. Using simulation, a 
designer can determine both the functionality and the performance of a design before the expensive 
and time-consuming step of manufacture. The ability to discover errors early in the design cycle is 
especially important for MOS circuits, where recent advances in manufacturing technology permit the 
designer to build a single circuit that is an order of magnitude larger than ever before possible. This 
thesis presents three new algorithms designed specifically for the simulation of large digital MOS 
circuits. 

Today's MOS circuits offer special challenges to a simulation program, challenges that are not met 
very well by current simulators. New integrated circuits can incorporate hundreds of thousands of 
transistors; the sheer number of transistors dictates that a simulation algorithm use simple, 
computationally efficient transistor models. In addition, designers take advantage of the symmetry of 
the MOS transistor to build circuit configurations with behavior beyond the ken of traditional logic 
simulators. The new simulators introduced here are designed to meet these challenges. 
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1.1. Overview of the thesis 

To use a simulator, the designer enters a design into the computer, typically in the form of a list 
of circuit components where each component connects to one or more nodes. A node serves as a wire, 
transmitting the output of one circuit component to other components connected to the same node. 
The designer then specifies the voltages or logic levels of particular nodes, and calls upon the simulator 
to predict the voltages or logic levels of other nodes in the circuit. The simulator bases its predictions 
on models describing the operation of the components; a simulator is characterized by the types of 
component models it employs. Two of the more popular approaches are: 

• component models based on the actual physics of the component; for example, a 
transistor model that relates current flow through the transistor to the terminal 
voltages, device topology, and manufacturing parameters of the actual device. 

• component models based on a description of the logic operation performed by the 
component, eg, nand and NOR gates. 

The first type of model is found in circuit analysis programs such as asTap [Weeks73] or SPICE 

[Nagel75] which try to predict the actual behavior of each component with a high degree of accuracy. 

Current circuit analysis programs do the job well, perhaps too well; at no small cost, they provide a 

wealth of detail, at sub-nanosecond resolution, about the voltage of each node and the amount of 

current through each device. (For example, a properly calibrated circuit analysis program is able to 

predict, within a few per cent, the amount of current that flows through an actual transistor.) This 

level of detail would swamp the designer if collected for the entire circuit while simulating, say, a 

microprocessor. Fortunately, the designer is spared this fate, since the computational cost of circuit 

analysis restricts its applicability to circuits with no more than a few hundred devices. 

One solution to the problem of simulator performance is to adopt a simpler component model, 
such as the gate-level model introduced above. This approach works well when dealing with 
implementation technologies that adhere to gate-level semantics (eg., bipolar gate arrays). However, 
mos circuits contain bidirectional switching elements that cannot be modeled by the simple 
composition of Boolean gates. Since many of the circuit techniques that make MOS attractive for LSI 
and vlsi applications take advantage of this non-gate-like behavior, it is important to model such 
circuits accurately. 

This thesis explores the possibility of providing the essential information (functionality and 
comparative timing) for large digital circuits by using models that bridge the gap between the gate- 
level and detailed models discussed above. The goals to be met by these new models are summarized 



in the following list: 

(i) ITie underlying model must be computationally tractable for large circuits. The 
empirical nature of the verification provided by simulation suggests that it must 
be applied extensively if the results are to be useful; timely simulation 
encourages this. 

(ii) Transistor-level simulation is necessary to accurately model the circuit structures 
found in MOS designs. This allows the designer to simulate what was designed — 
an advantage, since requiring separate specification of a design for simulation 
purposes only introduces another opportunity for error. 

(iii) The results must be correct, or at least conservative; a misleading simulation that 
results in unfounded confidence in a design is probably worse than no simulation 
at all. Here, we must trade off the conflicting desires of accuracy and efficiency. 

Two models are examined in detail by die thesis: 

• a linear model in which a transistor is modeled by a resistance in series with a 
voltage-controlled switch. The state of the switch is controlled by the voltage of 
transistor's gate node. 

• a switch model, similar to the linear model, except that resistance values are limited 
to one of two quantities: for for n- and p-channel devices, and 1 for depletion 
devices. 

mos circuits are easily transformed to use either model, as illustrated by the following figure, 





A J 




(a) original circuit (b) linear model (c) switch model 

Figure 1.1. Two approaches to modeling a simple MOS circuit 

The linear model forms the basis for the RSIM simulator. In RSIM, networks of transistors and electrical 
nodes form an R-C network (R for the transistors, C for the interconnect and gate capacitance); the 
network's behavior under different inputs is calculated by a selective-trace (event-driven) algorithm. 
The comparatively fast "pseudo circuit analysis" that is possible with the linear model allows the 
designer to determine both the functional and approximate timing characteristics of a network. RSIM 
goes a long way towards meeting the three goals outlined above. The algorithm employed to estimate 
the behavior of a linear network is much faster than a typical circuit analysis program. Resistors are 



inherently bidirectional; the network analysis makes no a priori assumptions about the direction of 
current flow through each resistor. Finally, the results are at least qualitatively correct and, in general, 
conservative — ■ in some cases more conservative than designers themselves might like. With the 
appropriate choice of model parameters, the results can even be quantitatively usefuL 

The switch model is a simplification of the linear model that is useful when only a circuit's 
functionality is of interest (ue., no information on performance is wanted). Like a traditional gate-level 
simulator, a switch-level simulator bases its predictions on an abstraction of the actual circuit, but the 
switch model is able to handle the bidirectional nature of MOS transistors much more successfully than 
a gate-level model. The switch model is incorporated by ESIM, a simulator that has seen extensive use 
in the last few years. 

Certainly a major goal of RSIM and ESIM is to provide a fast, useful simulation of MOS circuits, 
but the story does not end there. Another motivation for new simulation algorithms is the dunging 
nature of the design community. In order to cope with the increasing complexity of integrated circuit 
design, new design methodologies have developed (eg, [Mead80]) that impose constraints on die way 
circuits are constructed. One can no longer afford to hand-craft each transistor, so rules of thumb are 
created to aid in the choice of transistor sizes. Clever circuit configurations are avoided m favor of 
circuits composed under the guidance of composition rules (eg., [Bell81J) that rule out arbitrary circuits 
and the obscure electrical behavior they imply, f 

These new design methodologies have opened up the field of LSI design to a new breed of 
"Mead and Conway" designer, te, a designer who is a sophisticated architect, but who is not a 
specialist in LSI technology. An important aspect of the simulators described in this thesis is that their 
underlying models are easily understood by this new breed of designer. The abstractions embodied by 
the simulators are faithful enought to the actual electrical behavior of a circuit that the achievement of 
a successful simulation run indicates freedom from a large class of potential failure modes. If a 
simulation does point out an error, it does so in a manner that leads even the novice designer to a 
good understanding of the circuit as actually designed and the ways in which it might differ from the 
intended design. 

However, the simulators are based on models of actual behavior. As with any model, 



tState-of-the-art designs intentionally exploit the "obscure" behavior of certain circuits (eg., sense amplifiers), often 
to considerable commercial advantage. RSIM and its relatives arc not as useful for this type of design as convention- 
al circuit analysis programs. But the professionals engaged in such well-focused designs are not the audience ad- 
dressed by Mead and Conway (and RSIM). 



discrepancies are likely to exist between the model predictions and the actual behavior of a circuit 
The tools described here attempt to be conservative, Le., to give pessimistic predictions, but this cannot 
be guaranteed. Thus, it is important that the designer become acquainted with the inner workings of 
the models and their shortcomings. The tools perform a calculation one could do by hand (only faster 
and with greater accuracy and consistency) — they should not be treated as black boxes. The models 
presented here are simple enough to enable any designer to gain the necessary understanding. 

A final motivation for new simulation technology is the desire to improve simulator performance. 
It seems that digital computers ought to be well suited for the simulation of digital logic. 
Unfortunately, current simulation schemes involve several layers of interpretation (e.g., command 
interpretation, access to the network data base, model evaluation), and their performance suffers as a 
result Happily, much of this overhead can be eliminated through the application of traditional 
compilation techniques. This is the theme of the final section of the thesis, and the motivation for the 
development of CSIM, a combination compiler/simulator. CSIM compiles a network into a simulation 
subroutine; the subroutine contains code to compute die new value of each node from its old value 
and the values of other nodes in the network. The compilation is particularly easy when the node is 
the output of a logic gate, and the work presented here extends the compilation technique to any node 
in a mos circuit. Simulating the network entails executing the subroutine repeatedly until no nodes 
change value. If the circuit is very active, i&, if many nodes change value each time the network is 
simulated, the simulation subroutine computes new node values many times faster than the 
corresponding event-driven simulation. There has been much interest recently in special purpose 
hardware for simulation {Pfister82, Zycad83]. It may be that such developments are premature, and 
that substantially better simulation performance can still be obtained from general-purpose computers. 

The relationship among RSIM, ESIM, and CSIM is illustrated in the table below. 
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ESIM 


CSIM 


node values 


logic-level 
(from voltages) 


logic-level 


logic-level 


model level 


transistor 


transistor 


node equations 


components 


resistors & 
capacitors 


switches & 
capacitors 


equations 
(from switches) 


scheduling 


event-driven 


event-driven 


compile-tirne 


relative speed 


1 


.5-3 


.1 - 100 



No one simulator has a speed advantage, for reasons explained in subsequent chapters. It is not 
unusual to use all three simulators during the course of a design, since each brings out a different 
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aspect of a circuit's behavior. ESIM is often used during the early stages of a design when the designer 
is fleshing out the logic. RSIM is used to determine which portions of the design are in need of a 
careful performance analysis; usually the performance of most of the circuit can be debugged with the 
level of detail provided by rsim. Finally, csim is useful for long simulation runs intended to verify the 
functionality of the design through extensive diagnostics. 

This thesis presents the new models and their accompanying simulators in detail, exploring the 
ramifications of each model and discussing the accuracy and usefulness of their predictions. The next 
section gives a brief outline of the remaining chapters. 

1.2. Outline of the remaining chapters 

The thesis has three main parts. The first part focuses on the linear model and the rsim 

simulator. 

Chapter 2 description of the switch/resistor transistor model incorporated by 
rsim; outline of the method for calculating a node's value using the 
linear transistor model; propagation of changes through the network; 
choosing model parameters; analysis of sample circuits using linear 
model. 

Chapter 3 justification of the linear model by analysis of true behavior of MOS 
logic gates; comparison of actual voltages and propagation delays 
with rsim's predictions; proposal for modifications to the model 
based on insight gained during analysis; analysis of sample circuits 
using updated model. 

Chapter 4 details of converting the linear model into a workable simulation 
algorithm; optimizations for improving simulator performance; 
mechanisms for controlling the voltage and transition time predictions 
for specific nodes; review of the successes and failures of the linear 
model. 

The second part (Chapter 5) presents the switch-level model. The chapter begins with a 
discussion of the representation of node values and explains why many extant simulators adopt a 
representation that leads to unnecessary difficulties. Next, two switch-level algorithms are presented. 
The first is a straightforward adaptation of the rsim algorithm, replacing its resistance computations 
with simpler ones that reflect the resistance value constraints of the switch model. The second 
algorithm is based on an entirely different approach; each computation handles a single transistor and 
uses only local information (the type of the transistor and the states of its terminal nodes). The 
computation is easy to understand and appeals to our intuition about the way transistors really 
operate. The simulation proceeds by repeatedly computing new node values for the source and drain 



■jiJwas&tw**^^ 



11- 



nodes of individual transistors, choosing the transistors in any convenient order. The simulation is 
complete when no further changes in the network state are possible. The termination of this 
relaxation algorithm is proved, and the final network state is shown to be independent of the order in 
which the individual computations are performed. The second algorithm is well suited for 
implementation on the new parallel architectures just now becoming available; the approach discussed 
here is a first cut at designing simulation algorithms tailored for use on parallel engines. 

The third part (Chapter 6) investigates the possibility of using various compilation schemes to 
improve the performance of the switch-level simulator. A technique is proposed for constructing a set 
of equations for each node in the network. These equations relate the new value of a node to its 
current value and the values of other nodes in the network. The network can be simulated by 
evaluating each node's equations in turn; several ways of ordering the nodes for evaluation are 
discussed. The section concludes with several examples of simulation routines that were compiled 
directly from the network data base. When executed, these routines result in a simulation several 
orders of magnitude faster than otherwise possible. 

The thesis concludes with a discussion of other work in the area of simulation and its relationship 
to the ideas presented here. 
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CHAPTER TWO 



A Linear Network Model for MOS Simulation 



The electrical model described in this chapter can be used as the basis for a logic-level simulation 
of a network of MOS transistors. Other models are of course possible, ranging in accuracy and detail 
from circuit analysis to high-level functional simulation. While the chosen model does not encompass 
many of the operational details of real MOS networks (most notably, detailed transistor modeling) it is 
adequate to efficiently determine the basic functionality and the approximate timing characteristics of 
a network. Short circuits, charge sharing, nodes with multiple drivers, bidirectional "pass" transistors, 
and so on are modeled correctly. 

The first section describes the switch/resistor transistor model incorporated by RS1M. Using this 
model, a MOS network is simulated as a resistor network where each node's value is determined by the 
resistance of its connections to various inputs. The second section outlines the method for calculating 
the value of each node. This is followed by an explanation of the use of component models to predict 
the propagation of new input values through a network. The fourth section discusses techniques for 
choosing model parameters and compares RSiM's predictions with those of a circuit analysis program. 
The chapter concludes with a summary of the model's ingredients. 
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2.1. RSIM's transistor model 

The transistor model in RSIM can be quite simple since it is only used to predict the final logic 
state of a node and the length of time each state transition takes. As an example of how the model 
works, consider a simple inverter: one can think of the effective resistance of its component devices at 
any moment as 



» rr „ - lds 'P u,lu P 
Keff-.pullup - — - 
v ds:pullup 



Reff .pulldown 



_ l ds:pulldom 
Vds .pulldown 



(2.1) 



The following figure shows the actual effective resistance of an inverter's pullup and pulldown as a 
function of the inverter's output voltage (assuming no load current). 



Sff 




Vrfc- 



ds:putlup 



ds:pulldown 



pullup 



pulldown 



v ds:pulldown 
Figure LI. Effective device resistances in an inverter 

Although the effective resistances of the transistors change as their terminal voltages vary, it might be 
possible to use "average channel resistances" to characterize the transistors' behavior. 

The other salient feature of a transistor's operation is its switch-like behavior. With certain 
voltages on a transistor's terminal nodes, it makes no connection at all between its source and drain 
terminals — the transistor is "off". As the relative terminal voltages change, the transistor turns "on", 
conducting current between its source and drain terminals. As illustrated in the previous figure, the 
transistor is more "on" at some times than others, but the distinction among the different "on" states 
can be ignored for simplicity. 

There are three basic types of transistor switches found in MOS circuits: 
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(a) n-channel switch 


(b) p-channel switch 



drain 
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HI 



source 
always ON 

(c) depiction switch 
Figure 2.2. Three types ofMOS transistor switches 

The difference between n-channel and p-channel switches is the logic level which turns on the switch. 
The depletion switch is always on; it is usually connected to vdd in a way that provides a source of 
current to keep its output node charged high. More precise distinctions between the switch types, and 
the need for a depletion device (and why an ordinary switch does not suffice) are discussed in Chapter 
3. 

One can build on the observations made above to construct a linear transistor model for RSIM: 



drain 



gate O- 



C 



gate 



o ' 



source 
(a) n-channel transistor 



Reff 



source 



open 



Vgate = ° 



closed v^ = 1 
unknown v te = unknown 



(b) RSIM model 



Figure 2.3. RSIM model for n-channel transistor 

It is easy to tabulate the sort of connection that exists between the source and drain terminals as a 
function of the gate voltage: 



Rds = 



R e jf switch closed (vgme — 1) 

00 switch open (vgate =0) 

\R e ff,oo] switch unknown (y ia te = X) 



(2.2) 



Note that uncertainty about the state of the switch leads naturally to an interval describing the 
resistance of the source-drain connection. In fact, all the network calculations use interval arithmetic, 
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and the bounds of the resulting intervals are used when converting voltages to logic states, etc.; no 
other mechanisms are needed to deal successfully with X states in the network. Models for other 
types of transistors differ in the way the position of the switch is determined from v gale : 



drain 



gale « — 




drain 



open 


v 
gate 


= 1 


closed 


v gate 


= 


unknown 


v gate 


= X 



*eff 

source source 

(a) p-channel transistor model (b) depletion transistor model 

Figure 14. R SIM models for p-channel and depletion transistors 



The effective resistance R e ff is determined separately for each transistor and depends on 

width, length dimensions of the active transistor area. Various non-linear effects 
make R e jf a more complicated function of the transistor geometry 
than just length/width. 

type Most mos circuits contain more than one type of transistor. The 

different types are distinguished by different values for their 
threshold voltage. Since the current conducted by a transistor is a 
function of its threshold voltage and hence its type, the modeling 
resistance also depends on the transistor type. 

context Accuracy in choosing the effective resistance can be improved by 

distinguishing several contexts in which a transistor may appear: for 
example, an enhancement transistor can be used as a pulldown or 
source-follower in addition to its default pass gate configuration. 
Surprisingly few contexts need to be recognized to encompass a large 
portion of digital MOS designs. 

The determination of Rtf is made once for each transistor and does not depend on any dynamic 

properties of the circuit to be simulated. During simulation the only device information RSIM uses 

about a transistor is its effective resistance. 

Actually rsim uses not one, but three effective resistances for each transistor. To understand 
why, recall that rsim tries to predict the transition time and final voltage for a node, as shown in the 
following figure. 



node 
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switching threshold 



final voltage 



transition time 



time 



Figure 15. R g fris used to predict transition time and final voltage 

One would like to calibrate the model to give accurate predictions for both quantities, but that is 
impossible with a single set of resistances. To solve this problem, RSIM uses three resistances for each 
transistor: 

Raatk when calculating the final voltage. 

Rjynbw when calculating the transition time for high-to-low transitions. 

Rdynhigh when calculating the transition time for low-to-high transitions. 
Two "dynamic" resistances are used so that the asymmetric behavior of pass devices can be accurately 
predicted. Computations involving R e jf are triplicated, one for each of the three actual resistances, so 
subsequent calculations can use the appropriate value. 



11 RSIM's node model 

Voltages in this model are quantized into one of three values; this corresponds to our intuition 
for digital logic and greatly simplifies the simulation calculations. If all node voltages are normalized 
to fall in the range [0, 1], then the possible quantized values are 

logic low — voltages in the range [0, v/ w J; 

1 logic high — voltages in the range [vhtgh, lfc 

X intermediate voltages, [vi ow ,mghl or unknown voltages, [0, 1] — to be 
conservative X is always interpreted as representing the larger interval; 

where v/ 0H > and \high are the predetermined logic thresholds. 

How is the value of a node determined? Using the transistor model described in the previous 

section, the original network is transformed into a network of resistors (formerly transistors) and 

capacitors (formerly nodes). If a node is not connected to any input, it is said to be charged with a 
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logic state determined by the state of the last driven node it was connected to. If two or more charged 
nodes in different logic states arc connected then charge sharing occurs. In this case, all the connected 
nodes reach the same logic state; this state is determined by the relative capacitances and initial logic 
states of the nodes in the stage. For example, if a large (high capacitance) node such as a data bus 
were connected by a pass transistor to a small node such as the input to a register cell, then the small 
node would "share" the charge of the large node as its final value regardless of the charge it had 
initially. Even nodes that ultimately have a connection to an input participate in charge sharing; the 
extent of their participation is governed by the relative sizes of the charge-sharing time constant and 
the time constant associated with the input connection. 

Electrically connected nodes form natural groupings, called stages, bordered by input nodes 
(usually vdd and GND). If nodes in a stage are allowed to share charge, all will reach the same 
voltage, V s kan, given by 

2 c i 2 ci + 2 c i 

v _ 1 nodes v 1 nodes X nodes i>% r\ 

" share :min — — V* " share .max — "V * ' 

2d °i 2d c i 

all nodes all nodes 

where the sums are over nodes in the current stage. Since nodes at logic state X contribute an 
undetermined amount of charge to the result, V S hare is an interval whose bounds represent the worst 
case assumptions about the actual values of X nodes. These bounds are compared with the logic 
thresholds when calculating the charge-sharing value: 



Charge-sharing value = 



' share '.max js Wow 

1 Vshareimin 2> m g h ( 2 - 4 ) 
X otherwise 



This calculation is not strictly accurate when the stage contains transistors with gates of X. Such 
transistors might not make any connection at all; invalidating the various sums in equation 2.3. An 
alternative charge-sharing calculation that addresses this problem is discussed in Section 4.1.1. 

When one accounts for the resistance between nodes, it is difficult to calculate transition times 
for any nodes that change value because of charge sharing. RSIM simply schedules any charge-sharing 

transitions so they happen immediately. A more reasonable time constant might be (2^Ri)Ceff where 

i 

the first term is the sum of all the resistances in the stage and 



Osr = 



2 «■ 

and AC nodes 



land* 



Ci 



nodes 
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Charge-sharing value = 1 
Charge-sharing value = 
otherwise 



(2.5) 



is the amount of capacitance in the stage that needs to be charged/discharged to reach the charge- 
sharing value. This time constant is surely an upper bound on the time of any transition in the stage. 
Note that transitions to X still happen immediately, a conservative assumption. 

If a stage is connected to one or more inputs, the inputs determine the final voltage of each node 
in the stage. The effect of inputs on a particular node is characterized by the Thevenin equivalent for 
the stage (including the inputs at the boundary), regarding the given node as the output: 



R dme 



J WV 



-C node ) 



'thev 







Cload 



Figure 2.6. Equivalent circuit for a network node 

Vthev a voltage interval [K_, V+] in the range [0, 1] specifying the possible voltages 
the output node may have. This value is calculated using each transistor's 
Rsatk resistance. 

Rdme a resistance interval [R _,/?+] in the range [0, oo]. Three versions of this 
value are calculated; Rdme-.low, using Rjynbw for each transistor; Rdrive-.high, 
using Rdynhigh'- and Rdrive.x (see section 4.1.2). The appropriate version is 
chosen depending on the final voltage predicted by V t hev 

V t hev and Rdme are generally intervals, since the effective transistor resistances from which they are 
derived might themselves lie in an interval. Chapter 4 describes how V t hev, Ciad, and Rdrtx are 
estimated for nodes in actual networks. 

It is sometimes useful to categorize a node according to its equivalent Rdme, i-e,, how it affects 
neighboring nodes to which it becomes connected by conducting transistors: 

input (Rdme = 0). Node is a designated input node (eg., VDD or gnd). The value of 
input nodes can only be changed by explicit simulator commands; the assumption is 
that inputs supply enough current to be unaffected by connections (possibly snorts to 
other inputs) made by transistors. 



19- 



driven (Rdrive < °°)- Node is part of a voltage divider between two inputs, i.e. s it is 
connected by transistors to other driven or input nodes. Driven nodes can affect the 
value of charged nodes without being affected themselves, but may be forced to an X 
state if shorted to a driven or input node that has a different logic level. 

charged (Rdrive - °°)- Node is connected, if at all, only to other charged nodes. 
Until reconnected to some other part of the network, charged nodes maintain their 
current logic state indefinitely (charge storage with no decay). 

If Rdrive is infinite, equation 2.4 predicts the correct final value for the node and no further work is 

needed. If Rdrive < °°. and the node is not an input, the final state of a driven node is calculated from 

the Vthev interval [K_, V+]: 



Final value = 



V+< v low 

1 V- > mgh 
X otherwise 



(2.6) 



As an example, consider several different states of a NOR gate: 
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(a) NOR gate (b) A = B = (c) A = 1, B = (d) A = 1, B = X 

Figure 2.7. Equivalent circuits for a NOR gate with different inputs 
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If the final value of a node differs from its charge-sharing value, then the appropriate event is 
scheduled ReffC e ff seconds in the future, where 



Reff = 



Rdnve-.high final value = 1 
Rdrm-.low final value = 
Rdrive- x final value - X 



(2.8) 
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C eff = 
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final value = 1 
final value = 
/?na/ value = AT 



(2.9) 



where the sums are computed for nodes in the current stage. Note that transitions to X are not 
immediate, but have a time constant related to the fastest transition the node can make. This means 
that a momentary short-circuit, such as that shown in the following figure, does not necessarily cause a 
node to become X; what happens depends on the relative sizes of the various time constants. 



H> 



y 



~* 1 * I — r- large capacitance 

Figure 2.8. A momentary short-circuit does not necessarily cause an X value 



If the delay through the inverter is small compared to the time constant of the output node, no X 
transition will be processed for the output node (one is scheduled, but is aborted when the pullup 
turns off). 

To better understand the interaction between the charge-sharing and final-value calculations, 
consider the following example: 



o-»i 




Figure 2.9. Sample circuit for charge-sharing and final-value calculation 

Assuming that Cb is initially charged low and that charge sharing happens immediately (an 
assumption rsim makes), there are several different scenarios: 

Ca«Cb node A goes low immediately because of charge sharing with B. Then, 
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both nodes arc driven high by the pullup — node A at time 
R \(Ca + Cb ), and node B at time (R \ + R 2XO + Cb )• 

Ca»Cb node B goes high immediately because of charge sharing with A; the 
pullup has nothing to contribute. 

Ca a Cb both A and B go to X immediately and are then pulled up with the same 
time constants as for Ca «Cb- 

If Ri is reasonably smaller than R\, then the assumption that charge sharing happens quickly is valid, 
and these scenarios are satisfactory. As R 2 approaches R \ in value, the time constants associated with 
charge sharing approach those of the pullup, and the assumption of immediate charge sharing is a 
relatively poor one.t Augmenting the charge sharing calculation as described in equation 2.5 would 
improve the prediction in this case. 

In summary, calculating a node's value involves two separate computations, each of which can 
generate a new event: 

(1) a charge-sharing event describing an immediate change in the node's state caused 
by the redistribution of charge among the capacitors for nodes in the current 
stage. This type of event is generated when two stages are merged (ie., a 
transistor turned on). 

(2) a final-value event describing what the final, driven state of the node will be. 
This type of event is generated when Rdme < °°- 

Chapter 4 describes the way these two events are reconciled with each other and with pending events 

to produce a final set of transitions for a node. 

13. RSIM's network model 

The networks^ simulated by rsim are made up of two basic components: 

(i) electrical nodes which serve as wires. Each node has a capacitance that is the 
sum of two contributions: (1) capacitance between other layers and the 
conducting layers that make up the node; and (2) capacitance from the gate 
junctions formed by the node. 

(ii) three-terminal transistors (mosfets) which act as switches. Each transistor 
conditionally connects two nodes (called the source and drain of the transistor) 
depending on the voltage of the third node (called the gate of the transistor). 

Some nodes (eg., vdd and GND) are designated as inputs that supply the current needed to change the 



tThis illustrates the asymmetry between the timing of transitions due to charge sharing and those due to the final 
value calculation, Le., R2 affects only the final value transition. This anomaly could be exploited to produce rather 
bizarre predictions, eg., a node changes faster if it is connected to a capacitor than if it is connected to an input! As 
a practical matter, circuit performance seldom depends on the timing of charge-sharing transitions, and these 
anomalies are not significant. 

$ Networks can be entered as schematics [Tcrman82] or extracted from layout information [BakerfJO]. The latter ap- 
proach provides fairly accurate estimates of the capacitance of each node. 
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voltage of a node by charging/discharging the node's capacitor. As the voltage of a node changes, 
switches controlled by the node open or close, making connections that cause the voltages of other 
nodes to change. It is RSiM's job to predict the dynamic behavior of a network of nodes and switches, 
estimating the voltage of each node, the state of each switch, and the charge/discharge rate when a 
node changes value. From the designer's point of view, this translates into knowledge about the logic 
level of each node and the transition time associated with each change of logic level. 

It is easy to build switch configurations that compute simple logic functions of node values. For 
example: 

X 
X 
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o not A 




not A 



(a) constant 1 (b) nMOS inverter (c) cMOS inverter 

Figure 2.10. Examples of switch configurations that perform logic operations 

The output node in figure 2.10(a) is connected to a depletion switch configured as a current source; its 
value is always a logic high. Such circuits are called pullups because their output nodes are always 
"pulled-up" to logic high. In figure 2.10(b) a "pulldown" switch has been added, controlled by node 
A. The pulldown is sized so that, when it is on, it conducts more current than the pullup supplies. 
When A is 1, the output node is "pulled-down" to 0. Of course, when A is 0, the pulldown is off and 
the pullup ensures that the output is 1; the net result is an inverter circuit Figure 2.10(c) is an 
inverter constructed from one p-channel and one n-channel device. Typically, the manufacturing 
process can provide either p-channel devices or depletion devices, but not both, in the same circuit 
More complicated logic circuits are constructed using series and parallel switch configurations. 
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Figure 2.11. Logic Junctions associated with series and parallel configurations 

If the two-switch circuits shown above replace the pulldown in figure 2.2(b), the result is a two-input 
nor or nand gate. 

In all the circuits presented so far, the inputs are electrically isolated from the outputs, Le., if the 
output signal is corrupted somehow — by a short circuit, for example — the input signals are 
unaffected. The isolation provided by the gate connection leads to a natural decomposition of the 
network into stages made up of nodes and transistors. Nodes belong to different stages only if they 
are guaranteed to be electrically isolated. For example, in the following circuit, nodes A, B, C, and D 
are all isolated from one another. Node E is not isolated from D, so it is in the same stage as D. 



*D 



inputs outputs 

A -»( stage 1 >> B 

B -*( stage 2 >> C 

B -*/**""*>>-» D 
V stage3 J 

Figure 2.12. Simple circuit that has three stages 

Note that VDD and GND (and, in fact, any input) are not treated as nodes in the ordinary sense when 
checking to see if two nodes belong to the same stage. For example, node B is not considered to 
connect to node C by a path involving GND and two of the pulldown transistors. Given a particular 
node, a tree walk of the network is performed to find all other nodes in the stage. The tree walk first 
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locatcs all "on" switches which have a source/drain connection to the original node. Nodes connected 
to the drain/source of those switches arc part of the same stage as the original node. ITic tree walk 
continues until it locates all nodes that can be reached from the original node by a path of "on" 
switches; this set of connected nodes and the "on" transistors that form the connections make up a 
single stage. Note that the decomposition of the network into stages is a dynamic process, ie, one 
that depends on the node values of the network.! For example, the following circuit can be 
decomposed into 2, 3 or 4 stages depending on the value of nodes A and B. 




* F = A xor B 



Figure 2.13. Circuit with multiple decompositions 



Node F is always in a separate stage. If A=0 and B=0, then C, D, and E all form a single stage; if 
A= 1 and B=0, then D is isolated from C and E; and so on. 

When rsim simulates a network, it does its analysis stage by stage. Since the values of nodes in 
a stage are closely related (the nodes are shorted together), it makes sense to calculate all the values at 
the same time. By the same reasoning, all the transistors and nodes that influence the value of a 
particular node are in the same stage as that node. Stages are the analogs of gates in a gate-level 
simulator. In a gate network, each node's value is determined by a single gate, and the output of a 
gate is electrically isolated from the inputs; the gate is the ideal unit of analysis. In MOS networks with 
bidirectional devices, the traditional gate model is not adequate; hence the motivation for stages. 



tThis differs from the notion of "transistor group" introduced by [Bryant81]. A transistor group contains all nodes 
that might become connected, ie.. a stage with all switches considered to be conducting. Transistor groups can be 
quite large — for example, m circuits with barrel shifters that potentially short together all bits in a data path — 
whereas stages are usually quite small 
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A simulation step starts when the designer changes the value of an input. (By definition, any 
node given a value by the designer is treated as an input node by the simulator.) The value of the 
input influences other pieces of the network in two ways: 



input 




input * 




Gv Gv 



(a) 



(b) 



Figure 114. Two ways in which an input affects a network 

The simulator first recalculates the values of nodes in stages connected to the input by the 
source/drain connections of conducting switches (figure 2.14(a)). Then, for each switch controlled by 
the input, stages on each side of the switch are analyzed (figure 2.14(b)). If the switch becomes 
conducting because of the new input value, the pieces of the network on either side form one large 
stage. If the switch just turned off, it partitions what was previously one large stage into two smaller 
stages. 

If a node changes value as a result of analyzing a stage, rsim calculates the transition time by 
estimating the length of time required to charge/discharge the node's capacitance. The name of the 
node, its new value, and the estimated time when the transition to the new value occurs are all 
remembered as an event. The simulator maintains a list of pending events, keeping the list sorted by 
time, with the earliest event first 

When processing new input values causes a node to change value, a new event is generated and 
saved on the event list After all inputs have been processed, the simulator processes events, starting 
with the first element of the event list For each event, the specified node is assigned its new value. 
Then, any stages affected by this change (as shown in figure 2.14(b)) are analyzed, possibly generating 
new events, which are then added to the event list The simulator continues processing events until 
the event list is empty. The network is said to have "settled" at this point, and the new input values 
have been completely propagated through the network. 
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Note that if no nodes change value when a stage is analyzed, no new events are generated. 
Portions of the network that remain quiescent arc not analyzed, since the simulator only analyzes 
stages affected by inputs or by nodes on the event list By limiting simulation effort to the changing 
portions of the network, the event list mechanism enables the simulator to handle large circuits.- The 
amount of computation required for a simulation step is proportional to the amount of circuit activity, 
not the size of the circuit 

To get a better feeling for the way a change propagates through a network, consider the 
following simulation of the XOR circuit presented in figure 2.13. Nodes A and B are inputs; values for 
the other nodes are determined by the simulator. 
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Figure 2.15. Waveforms for simulat 
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Event #1. Node A is set to 1 by the user. The simulator recalculates all stages 
affected by A, in this case, the stage containing nodes C, D, and E 
(which form one stage because C and D are 1). 

All three nodes are pulled down by the switch controlled by A, so events #2, #3, and #4 are 

scheduled to set C, D, and E to 0. Note that the simulator calculates a different transition time for 

each node. C changes most quickly since it is connected directly to the pulldown. D is the slowest 

since it discharges through the two pass devices connecting it to the pulldown. 

Event #2. C changes from 1 to 0, causing the stages containing D and E to be 
analyzed. 

At the time event #2 is processed, nodes D and E are still 1, although they both have events pending 

for transitions to 0. When node C goes low, it partitions what was once one large stage into two stages 

— one containing only D, the other containing both C and E. Analysis of the stage containing D 

shows that D is no longer pulled down, invalidating the upcoming transition. The simulator has 
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several choices: 

(1) Notice that D is currently 1, so just remove the pending event for D. This results 
in D never changing value. This is not a bad prediction if D is scheduled to 
change substantially after C. 

(2) Schedule another event (#5) for node D, which changes its value back to 1; set 
the event time so that #5 happens after #4. This choice is best if C and D are 
both scheduled to become in close succession. 

(3) Remove D's pending event as in (1), but report a glitch (an aborted transition) to 
the user [Thompson74]; a sort of compromise between (1) and (2). Some 
simulators only report glitches if die aborted event has been pending "long 
enough" [Nahm80]. 

(4) Schedule another event as in (2) that changes D's value back to 1; also change 
the pending event to be a transition to X, or, alternatively, remove the pending 
event and schedule an immediate transition to X. 

As one can see, scheduling a new event is a thorny issue when it involves a node that already has 

events pending. Since D's value does not really matter (it does not control any switches itself), the 

first alternative seems the most reasonable. Given the simplicity of the RSIM model, it probably does 

not pay to overly complicate the scheduling of events. The transition-time estimates are not accurate 

enough to allow subtle distinctions to be made based on the relative transition times of nodes; RSIM 

avoids choices (2), (3), and (4) since they involve such distinctions. Note that a similar problem arises 

for node E. It has an event pending for a transition to the correct value (E is still going low), but the 

event could be rescheduled to reflect a faster transition time since the pullup on node D no longer 

impedes the transition. Chapter 4 details the exact choices made by RSIM under various circumstances. 

Returning to the example: 

Event #3. Node E is changed to 0, causing the stage containing node F to be 
analyzed. F is calculated to change value, so event #6 is scheduled. 

Events #4,5. Discussed in the preceding paragraph. 

Event #6. F is set to 1. F does not affect any other stages, so no events are added 
to the event list 

At this point, the event list is empty, and the network has settled. If the user now changes node B to 

1, a somewhat simpler sequence of events ensues: 

Event #7. Node B is set to 1 by the user, causing the simulator to analyze the 
stage containing D. D is predicted to go low, resulting in the scheduling 
of event #8. 

Event #8. D is set to 0, separating C and E into different stages which are then 
analyzed. C shows no change, but E is scheduled to go high (event #9) 
now that it is disconnected from C's pulldown. 
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Event #9. E changes to 1, and as a consequence F is predicted to change to 
(event #10). Note that the low-to-high transition time can be very 
different than the high-to-low transition time; RSIM takes into account 
the relative sizes of the pullup and pulldown. 

Event # 10. Finally, F is set to 0. 

Once again the event list is empty, and the network has settled. 



2.4. Calibrating and using the RSIM model 

From a practical viewpoint, the success of RSIM depends to a large degree on the choice of the 
modeling resistance for each transistor. The principal goal of the calibration process is to choose 
resistances that lead to accurate predictions. Actually, there are two separate sets of resistances to be 
chosen: static and dynamic. Static resistances, used to estimate node voltages, are comparatively easy 
to choose. When a circuit does not depend on device ratios for correct operation — eg., a pulled-up 
node or a cmos gate — the values chosen for static resistances do not affect the voltage computation, 
since the nodes connect to only one polarity of input When a circuit makes a connection to inputs of 
different polarities — eg., a nMOS gate with a logic-low output — the intervening nodes become part 
of a voltage divider, and the transistor resistances must be chosen to predict the divider's output 
voltage. Since only the ratio of the pullup and pulldown devices is constrained, there is considerable 
freedom in choosing the actual resistance values. Of course, inauspiciously chosen values can run 
afoul of range and round-off problems in the computation, but such problems are easily avoided. 

A more interesting problem is the choice of appropriate dynamic resistance values. One 
approach involves performing a series of experiments designed to measure the resistance of each type 
of transistor in various circuit contexts: 
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Figure 2.16. Simple experiments for measuring channel resistances 



Ideally, the experiments should be performed using actual circuits; when this is impractical, a well- 
calibrated circuit analysis program can be used to gather the needed measurements. Each of the 
experiments entails measuring the length of time required for the output to rise or fall from its starting 
voltage to the switching threshold. (Section 3.4.1 describes the reason for using single threshold, and 
the method for choosing it) If the load capacitance is known, an appropriate channel resistance can be 
calculated, essentially inverting the computation performed by rsim. Appendix 2 presents the 
transistor resistances derived in this manner for a typical 5/i nMOS process. 

Unfortunately, while the experiments outlined above lead to usable predictions of circuit 

performance, the predictions are not as accurate as one might like. The problem with the experiments 

is that the resistance measurements are made in a rather artificial context Factors important in 

determining the behavior of a transistor in a particular circuit (eg., shape of the input waveform, 

Miller capacitances, etc.) are not measured by the proposed experiments. Since the simple RSIM model 

does not account for these factors, they are missing completely from the calculations, leading to 

inaccurate predictions. There are two alternatives: 

(1) Modify the rsim model to include effects deemed important when making 
performance predictions. It is difficult to start down this road and sull keep the 
model simple; carried to its logical conclusion, this course of action leads to a 
circuit analysis program — the very thing rsim tries to avoid. There are, 
however, alternatives that fall short of abandoning the simple model; these are 
discussed at the end of Chapter 3. 
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(2) Conduct more sophisticated experiments using circuit configurations found in 
actual designs. 

An example of the second approach is the following experiment: 

shapes input waveform load inverter 

4- I 4 ^ 



> 



pair delay 
Figure 2.17. Deriving resistances by measuring inverter pair delay 

The delay through a pair of inverters involves both a rising transition (measuring the pullup resistance) 
and a falling transition (measuring the pulldown resistance). The initial inverter provides an 
appropriately shaped input waveform; the last inverter provides a realistic output load. The measured 
pair delay is arbitrarily split into a rising delay and a falling delay (say, % and V4 respectively), so that 
the pullup and pulldown resistances can be calculated. This leads to good predictions for the chains of 
inverting logic so common in MOS designs. Similar experiments can be designed to measure other 
resistances. The danger in this approach is that, because of the ad hoc nature of the experiments, the 
resistances might be inappropriate for new circuit configurations. However, with a prudent choice of 
circuits during calibration and design, this danger can be minimized. 

The following examples are analyzed using the simple calibration given in Appendix 2. The 
results give a feel for the performance of the "pure" resistance model, and also set the stage for the 
model improvements suggested in Chapter 3. The calculation of node voltages is straightforward and 
is not mentioned in the discussion below, which focuses on the calculation of transition times. The 
first example is a path through a PLA: 
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Figure 2.18. Sample circuit showing path through PLA 



Transistor sizes are given in microns as width/length. When the clock signal goes high, the input 
signal (buffered by the inverter on the left) propagates through the input buffer and the two PLA 
planes. The following figure shows the equivalent resistor/capacitor network; resistances are given in 
KO and capacitances in pf. 
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Figure 2.19. Equivalent RC network for PLA circuit (shows dynamic resistances) 

Note that the pullup for node C is recognized as a depletion source-follower without considering the 
actual voltage on its gate. Since depletion devices are always on, the inverter which leads from node B 
to the gate of the pullup is ignored by RSIM, and the timing for node C is always controlled by node B. 
Also note that the resistance chosen for the pulldown for node B reflects the threshold drop of node 
A. 

When calculating Rdynlow, RSIM simply calculates the net resistance to ground, ignoring the 
effects of any pullups. For example, a falling transition for node B takes (16X-05) = 0.8ns. This 
approach is not only simpler, but is conservative. (Adding the pullup resistance actually decreases the 
fall time from the Thevenin point of view). Using this approach, the table shows the results of 
propagating two different data values through the PLA. The time of each node's transition is shown 
in nanoseconds, as predicted by RSIM and SPICE 
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The discrepancies between the RSIM and spice predictions (-28% in case 1, -14% in case 2) can be 
traced to the fact that the current RSIM model does not account for the shape of the input waveform 
when analyzing a stage, f This is particularly noticeable in case 1 for the transition of node E The 
long rise time of node D slows the falling transition of E to a considerable extent; a fact blithely 
ignored by RSIM. 

The second example is a section of the OM2 data path [Mead80] consisting of the logic to drive a 
register select line, a register cell, and a bus line. The path to be analyzed starts with the clock going 
high, driving the select line high, finally causing the register cell to discharge the pre-charged bus line. 
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Figure 2.20. Register select and bus drive circuitry from OM2 data path 



fExamining the times in this example, one might be tempted to multiply the effective resistances by a constant factor 
in an effort to improve the accuracy of the predictions. But not all predictions underestimate the true transition time, 
and, as will be seen in Chapter 3, there are other improvements that can be made that address the root of the prob- 
lem. 
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Figure 121. Equivalent RC network for OM2 data path example 



The comparative analysis is given below; rsim comes to within 9% of the spice prediction. 
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2.5. Summary 

The RSIM model can be summarized as follows: 

• Transistors are modeled as switches with series resistors. Three resistances are 
chosen for each transistor and used to predict node voltages and transition times. 
Resistance values are determined by experiments, either with actual circuits or 
using a circuit analysis program. 

• Using the transistor model, a network of transistors and nodes is simulated as a 
network of resistors (from transistors) and capacitors (from nodes). A node's 
value is determined by voltages calculated in two ways: (1) from charge sharing 
with electrical neighbors, and (2) from the Thevenin equivalent circuit for pieces 
of network connecting the node to the inputs. When a node changes value, the 
timing for the transition is given by an RC time constant calculated using the 
resistances and capacitances of the surrounding network. 

• The network is viewed as an assemblage of small stages, each simple enough that 
its operation can be predicted in a straightforward manner. Information 
propagates through the network as a series of events (changes in a node's value); 
each event leads to an analysis of affected stages using the models described 
above. The isolation between stages of digital circuits allows each stage to be 
analyzed separately; the relative independence of one stage from another is one 
reason why the very rough approximations of RSIM are so serviceable. 

Several factors important for making accurate performance predictions are missing from both the RSIM 

model and the simple calibration experiments proposed in section 2.4. Chapter 3 suggests some 

modifications to the model that correct the more important oversights. Many implementation details 
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unspccified in this chapter arc discussed in Chapter 4. Chapter 4 also catalogs the successes and 
failures of the RSI\1 model, as finally implemented. 
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CHAPTER THREE 



Justification of the Linear Network Model 



This chapter undertakes a performance analysis of logic gates and other digital circuits with the 
goal of establishing a physical justification for the rsim model. By comparing the resulting equations 
with those proposed by RSIM, one can judge the accuracy with which the RSIM model predicts circuit 
behavior. As an added benefit, insight into actual circuit operation helps to motivate model 
modifications that improve the accuracy of the predictions. 

The first section lays the groundwork for the analysis, presenting the first-order equations that 
describe the operation of MOS transistors. The second section describes the node voltages found in 
common digital logic circuits and compares the results to RSiM's predictions. The next two sections 
analyze the propagation delay of logic gates and other network components. Finally, several 
modifications to the RSIM model are proposed, and the resulting predictions are compared to those of 
the original model. 

3.1. Electrical models for mosfets and gates 

The active component in a mos circuit is the mosfet, a type of transistor. The mosfet has three 
terminals: the source and drain (two symmetric connections), and the gate. By convention, the source 
and drain are chosen such that %, the voltage of the drain with respect to the source, positive. v gs , 
the voltage of the gate with respect to the source, can be either positive or negative. Depending on 
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the relative voltages of the three terminals, the mosfct conducts varying amounts of current between 
the source and drain terminals. The amount of current conducted depends on the region in which the 
mosfet operates. There are three possible regions: 



ids = 






vg$ - m < 


(off) 


y(v*j - Vth) 


0< v gs -v lfl < vos 


(saturated) 


*(VgS ~ V,h -r-h'ds 


vgs - m > vds 


(linear) 



(3.1) 



where v t h is the threshold voltage of the mosfet and 



)C = ^ M C «^(2S microam P s ) 
/ / voir 



(3.2) 



is a constant that depends on the width w and length / of the particular mosfet under consideration. 
The numeric estimate is for a typical nMOS process. These equations ignore second order effects on 

ids- 

In an nMOS process, there are two types of mosfets, distinguished by the setting of their 
thresholds: 



type of device threshold (VDD s 1) 



n-channel 
depletion 



v le st 0.14 



v td 



-0.6 



As we saw in Chapter 2, the simplest form of logic gate that uses these devices consists of: 

a single depletion pullup with its gate and source attached to the output node and its 
drain attached to VDD, and 

one or more pulldown paths connecting the output node to ground, each path 
containing one or more n-channel devices. 



-37 




not A 




A nand B 




A nor B 



HLHI 



Figure 3.1. nMOS logic gates 

The depletion pullup is configured so that Vg S -.pu = 0; since the threshold of a depletion device is 
negative, v g s\pu - v^ > 0, and the pullup is never off. Each n-channel pulldown is configured to be 
on when its gate voltage exceeds v te and off otherwise. If all the n-channel devices in a particular 
pulldown chain are conducting, the output load capacitance is discharged through the pulldown path 
and the output voltage is lowered (v oul = v / = logic low); otherwise the pullup pulls the output high 
("out = "oh = logic high). 

Equation 3.1 can be specialized for a depletion pullup, using the fact that Vgs :pu is always zero: 



ipu — 



f-lvtfl 2 



\"td\ <(l-v oa /) 



*pu( I "td I sr^-Xl - "out) I "td I > (1 - "out) 



(3.3) 



where v^i is the voltage of the gate/source node of the pullup. Since the drain of the pullup is 
connected to vdd, Vds-.pu = 1 - "out- To avoid confusion, the equations will be written in terms of 
| \td I since v t( j is negative. The current conducted by the n-channel pulldown in an inverter is given 
by: 



ipd ~ 



^in ~ v te ) 2 

, "out x 

KpdWn -Vte 2~)Vo tt / 



"in ~ "te < 
0< v in -"te < "out 

"in ~ "te > "out 



(3.4) 



where v,-„ is the voltage of the gate node of the pulldown. Note that the source of the pulldown is 
connected to ground (v,„ = v &s ;pd) and the drain is connected to the inverter's output (v ut = "ds:pd)- 
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For proper operation of the inverter, the sizes of the pullup and pulldown are chosen so that ipd > i pu 
when the pulldown is on. 

To understand the behavior of an inverter in more detail, it is useful to plot ids of the 
component devices as a function of the inverter's output voltage: 



'ds 



A 

sat 


linear 












-^ — > 



'ds 



A linear 


sat 




/ ^ Z 




- t 


fcf^' 


I 


-» 



v in = 1 



increasing v- 



Hv td l 
(a) depletion pullup 



'out 



l-v tt 1 v out 



(b) enhancement pulldown 



Figure 3.2. mosfet 1-V characteristics 

The ids of a depletion pullup depends only on v ow and thus a single curve suffices to show their 
relationship. For the n-channel pulldown, there is a family of curves for ids corresponding to different 
values of v/„. 

The intersection of the ids curves for the pullup and pulldown shows the inverter's output 
voltage, given a particular input voltage: 



»rt« a 



'ds A 




"out 




(a)v out = lwhenv in <v te 



< b > v out = v ol whenv in = 1 
Figure 3.3. v ( is determined by i and i^ 

In fact, one can plot the DC voltage transfer curve for an inverter, which shows the inverter's output 
voltage as a function of its input voltage. 
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Figure 3.4. Voltage transfer curve for an inverter 

The four regions (I -IV) of the curve correspond to various combinations of the pullup's and 
pulldown's operating regions. Note that the relationship between v,„ and v oul shown in figures 3.3 
and 3.4 applies when the voltages are allowed to stabilize; in a circuit with changing voltages, the 
relationship between the v /n and v^ is considerably more complicated, as will be seen in section 3.4. 

The next few sections use the equations presented here to develop equations for the quantities 
predicted by RS1M — node voltages and transition times — so that the RSIM model can be evaluated 
and perhaps improved. 

3.2. Node voltages 

When v/„ < v K , the n-channel pulldown conducts no current; the depletion load continues to 
conduct as long as v out < 1. Therefore, the logic high output voltage of an inverter is given by the 
equation: 



Voh = 1 



(3.5) 



When Vi„ > v te , the n-channel pulldown is on and the output node reaches an equilibrium voltage v /, 
which is determined by (1) the relative sizes of the pullup and pulldown and (2) the gate voltage on 
the pulldown. v / is that voltage where the current of the pulldown (at this point in its linear region) 
is balanced by the current of the pullup (in saturation): 



*pd(Vl 



v ol x */>« i 1 2 

■in ~ v le ~ -J-) v ol = -y-|Vft/r 



(3.6) 
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If one assumes that v,„ = 1 (as is the case when v oh of the previous stage is 1) and that 
v i « 1 _ v «< th en 



v / 



„ 1 \vid\ 2 M 0.21 
~ 2/? (l-v«) ~ fl 



(3.7) 



where R = ^- = — — is the ratio of the sizes of the pullup and pulldown. R is chosen so as 

to guarantee that the low output of a gate turns off the pulldowns of gates connected to the output, 
Le., so that v i is less than v te by a comfortable margin; typically R is chosen to be about 4 if v,„ = 1. 
Now consider the RS1M model for an inverter: 



>i 



-o v. 



out 



^ 



-0 V, 



out 



*pd 



(a) v in at logic low 



(b) Vj n at logic high 



Figure 3 5. RSIM inverter model 

When % is low, the pulldown is off and the inverter is modeled with a single resistor. In this 
configuration, RSIM predicts 



Voh.RSIM = 1 



(3.8) 



agreeing with equation 3.5, independent of the value chosen for R pu . When v,„ is high, the inverter is 
modeled by a voltage divider, rsim predicts 



Vol:RSlM = 



Rpd 



Rpd + Rpu 



(3.9) 



One should choose R pu and Rpd so that v i : rsim is the same as v /, as given by equation 3.7. Thus 
the rsim model can accurately predict the output voltages of logic gates; in fact, there are two 
unknowns and only one equation to satisfy, so there is some freedom in choosing the static resistance 
values. 
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Thcre are circuits for which RSIM docs not properly predict node voltages. For example, in the 
following circuit, the voltage of node B only reaches 1 - v K : 



t> 



1 





(a) sample circuit 



(b) equivalent resistor networks 



Figure 3.6. Sample circuit illustrating voltage drop across pass transistor 



N-channel devices configured the same way as the horizontal transistor in figure 3.6(a) are called 
"pass" transistors, and are used to implement dynamic latches, various types of steering logic, and so 
on. Figure 3.6(b) shows the equivalent resistor networks for the circuit According to this model, the 
voltage for node B should reach VDD when node A is low. In the actual circuit, however, the pass 
transistor cuts off when B reaches 1 - v te since, at that point, Vgs-.pass = vte- In general, the source 
voltage of a pass transistor never rises above a threshold-drop below its gate voltage. Thus the RSIM 
model incorrectly predicts the voltage of node B. 

In fact, the network analysis performed by RSIM does recognize that node B never reaches VDD. 
As shown by several examples in Chapter 2, the resistance for a pulldown with a gate that has a 
threshold voltage drop is not chosen in the same way as the resistance for a normal pulldown. In 
other words, the value of R5 in figure 3.6(b) reflects the knowledge that node B has a threshold drop. 
This knowledge could also be used to adjust the prediction of B's voltage, but this is not currently part 
of the calculation. 

There are many other circuit configurations that are beyond the ability of RSIM to analyze, 
although most such circuits could not, in all fairness, be called digital. One important exception, which 
RSIM does not handle, but which occurs in performance-critical digital circuits, is called bootstrapping. 
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Figure 3.7. Bootstrap circuits lead to voltages greater than VDD 

Node A is small compared to node B, to which it is capacitively coupled. The coupling capacitor need 
not be explicit; often enough coupling is provided by the gate/source overlap capacitance of the 
transistor controlled by A. Node A is driven high through a pass transistor, and in turn enables the n- 
channel pullup that is controlled by A and connected to node B. Since the capacitance of A is small 
compared to that of B, A reaches a significant voltage before the voltage of node B begins to change; 
the difference is usually around 3 volts in common bootstrap configurations. As the voltage of node B 
increases, the coupling capacitor maintains this initial voltage difference between nodes A and B, and 
so the voltage of A increases correspondingly, f It is not unusual for node A to reach 8 volts or more. 
This, of course, increases the voltage on the gate of the pullup, which in turn increases the current 
flowing into node B. The net result is that node B reaches its final value much more quickly than one 
might expect Just as important, the voltage of B rises all the way to VDD instead of stopping two 
threshold drops below, as a simple analysis might predict 

Both the faster transition time and higher-than-expected voltage for node B are completely 
missed by rsim. Since such circuits are often used in time-critical portions of the network, it would be 
nice for rsim to make correct predictions in this case. Unfortunately, there is no simple change to the 
simple rsim model that achieves the desired result However, by systematically replacing bootstrap 
circuits with more conventional circuits sized to give the same performance, RSIM can produce the 
correct results. This technique is discussed in the section on escape mechanisms in Chapter 4. 

In summary, RSIM 



tThe pass device through which node A is driven isolates A from the driving circuitry. After the voltage of node A 
reaches 1 — v te , the pass device cuts off, and stays off no matter large the voltage on node A becomes. This is be- 
cause Vg :pass — v te will be less than the voltage on either the source or the drain. 
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(i) predicts the output voltage of logic gates with acceptable accuracy. 

(ii) does not predict threshold drops introduced by pass transistors, but docs perform 
a static analysis of the network to recognize transistors whose gates are subject to 
a threshold drop, and adjust the modeling resistance accordingly. 

(iii) does not handle bootstrap and other more exotic circuits. However, a pattern 
matching/replacement technique is available for substituting equivalent circuits 
that simulate correctly. 



3.3. Propagation delay: overview 

When choosing a single number to characterize the timing behavior of a circuit, one often settles 
for determining the propagation delay: a measure of the length of time required for a change in an 
input value to be reflected in the output value. In digital circuitry, a significant change is one where 
the signal changes from logic low to logic high or vice versa. For a particular transition it is common 
to define "change" in relation to a threshold; the signal is said to change when it crosses the threshold. 
Consider the following single input, single output circuit: 



o- 



v in 



JL 



CIRCUIT 



I 



V 

out 



Figure 3.8. Test setup for measuring propagation delay 
The propagation delay is defined as 

tp — touiput — tjnput (3.10) 

where 

toutput is the time when the output voltage crosses the output threshold voltage; 

(input is the time when the input voltage crosses the input threshold voltage. 

This definition works well for a transition between and 1; however, delays associated with a 
transition to the X state are still not well defined since it is unclear whether the signals in question 
cross the threshold or not. Aside from this technical difficulty, the notion of propagation delay 
involving X's is rather muddy since X is not a "real" logic value, but more of an error state. The 
simulation algorithm must assign some delay to such a transition, and RSIM conservatively chooses the 
fastest possible transition of which the node is capable (sec equations 2.8 and 2.9). 
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The next step is to choose the input and output thresholds, a choice that depends on the 
particular circuit to be analyzed. There arc two important criteria for choosing thresholds: 

(1) The delay should never be negative. The thresholds should be chosen so that the 
input always crosses its threshold before the output does. The simulation 
algorithm quite naturally processes events in the scheduled order; allowing a 
negative delay might require backing-up a previously processed event 

(2) The output threshold for a circuit should be chosen without regard to its use, 
allowing a single threshold to be chosen for all inputs and outputs. In that case, 
only one delay computation is needed for each signal transition. 

Though these criteria are not compatible in general, they can both be met for the digital circuits of 

interest here. 

To simplify the analysis below, will restrict the class of input waveforms considered. In his work 

on waveform bounding, Wyatt [Wyatt83] observes that the transfer functions characterizing digital MOS 

circuitry meet certain criteria which guarantee that 

// two monotonic trial waveforms are chosen that bound the actual input waveform 
(which also must be monotonic), then the response of the circuit to the trial waveforms 
will bound the actual output waveform. 

Thus one can choose computationally convenient input waveforms, eg., simple voltage ramps, and 

determine the bounds on the propagation delay by analyzing ramps that bound the true input 

waveform. 

3.4. Propagation delay: logic gates 

In order to explore the timing behavior of MOS logic gates, this section analyzes the behavior of 
an nMOS inverter with a simple voltage ramp on its input. The analysis is based on the first-order 
equations for the component devices, presented in the previous section. The derivation is easily 
extended to more complex gates by adjusting the parameters of the inverter's pulldown to model the 
net pulldown-path resistance of the currently active pulldowns in the complex gate (see section 3.4.4). 
The derivation also applies to CMOS logic gates; the analysis of the low-to-high transition caused by a 
p-channel pullup is very similar to the high-to-low transition caused by an n-channel pulldown. For 
simplicity, only nMOS gates are considered below. 

For the purposes of the analysis, the inverter output is connected to a fixed capacitance that 
models the load driven by the inverter. 
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Figure 3.9. Inverter circuit to be analyzed 



At each moment, the output voltage and the current charging/discharging the load capacitance are 
related by 



iload = Cjoad 



dv, 



out 



dt 



(3.11) 



Unfortunately, this differential equation is hard to use as it stands because iioai is a function of both 
Vout and /. However, if one can find a suitable approximation for iioad that removes the dependency 
on vout, then the change in output voltage over a given time period can be determined by integrating: 



Qoadi^Vout) = J itoad(t) dt 



(3.12) 



The time needed for v out to change a specified amount is calculated by first performing the integration 
and then solving the resulting equation for t. This suggests the following plan of attack: 

(i) Find suitable approximations for //««/ to remove the dependencies on v out . 

(ii) Compute the output transition time using equation 3.12. 

(iii) Subtract from (ii) the input transition time, giving the actual delay from input to 
output. Rearrange the answer into an RC term (what RSIM predicts) and an 
error term. 

This discussion starts with a small digression on choosing the appropriate threshold voltage. 



3.4.1. Choosing the input/output threshold 

To see if one can choose a single logic threshold and still guarantee that the predicted delay is 
never negative, it is useful to consult the voltage transfer curve for an inverter: 
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Figure 3.10. Voltage transfer curve for inverter 

The transfer curve shows the static behavior of the inverter; for any given input voltage, it tells what 
the output voltage must be for the pullup and pulldown currents to balance. If the input changes 
rapidly enough, the output voltage may lag behind. If the input is going from low to high, then the 
transfer curve shows the minimum output voltage for a given input voltage; for a high-to-low input 
transition, the transfer curve shows the maximum output voltage for a given input voltage. 

Since it is desirable for the input and output thresholds to be the same, the input/output 
threshold voltage v^est, is chosen to be the point on the transfer curve where v/„ = v out .f This means 
that during a low-to-high input transition, if v/„ < v t hnshi then Vo« > v t krah> no matter how fast or 
slow the transition. In other words, the propagation delay is never negative. A similar argument 
applies for the other transition. To estimate v t hresh, first notice that at the region II -region III 
boundary, 



V/n = Vte + 






and v out = 1 - | vtd \ 



(3.13) 



If R =4, then v/„ = .44 and v out = .4, and so v t hresh is in region II Oust barely). In this region the 
pulldown is in saturation and the pullup is in the linear region: 



-y-(v/ n - v te ) 2 - Kpu(\v t d\(\-v ut) Y^~^ 

tThe same choice of threshold has been made in several other simulators [Koppel78, NahmfJO]. 



(3.14) 
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Sctting vj„ = v out = ^thresh* and solving for \ t hresh yields v t hresh = -439 — close enough to the II — HI 
boundary that the distinction is not important. 



3.4.2. Low-to-high output transition time, t_«.. 

To calculate t p n u an approximation for itoad is needed, iioad is just the difference between the 
pullup current (i pu ) and the pulldown current (ipj), so one strategy is to approximate the current 
through each component individually. Recall that v t f, r esh is near the region II -region III boundary of 
the inverter's voltage transfer curve, and notice that the part of the transition involved in the 
prediction (y out rising from to v t hrcsh) takes place almost entirely with the inverter operating in 
regions III and IV. This means that the pullup is in saturation, Le., 



Ipu — 



_ Kpu 



|Vtf| 



2_ 



'max 



(3.15) 



Choosing a specific approximation for ipj is not as straightforward. However, a good starting point is 
an approximation of the form shown in the following figure. 




max 



l off 




a off 

(b) resulting approximation for i Joa( j 



(a) approximation for Lj 

Figure 3.11. Approximation ofi^for t ,, calculation 

i ff is the time at which vj„ = v te . At this point in the development, there is not much one can say 
about t a , the time at which the pulldown current first starts to decrease. Certainly t a = t ff is an 
upper bound (resulting in a step function for ipj). Similarly, t a = is a lower bound since that is the 
time when the input voltage first changes. The choice of a specific value for t a will be discussed later. 

With this approximation, the output transition time, if,, is given by 



f * 

C loadiy thresh) = J Q iload(t) dl 



(3.16) 



where 
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iloadO) = 



*max( 



'max 





' < U 


1 - t a . 
toff ~ 'a 


ta<t < toff 




toff <t 



(3.17) 



Solving equation 3.16 for tf, yields 



lh = 



RpuQoad + jUoff + t a ) th > hff 

1 
llRpuCjoaditoff-ta)] 1 + I a t h < t ff 



(3.18) 



where R pu = 



Vthresh 
'max 



. Recalling that t p i h = t h - t inpuu 



tplh = 



tplh >: hff - ti 



input 



RpuCload + -j(toff + t a ) — ti n p U t 

1 
[2RpuC[oad(toff — t a )] + la ~ tjnput tplh < t ff — Unput 



(3.19) 



The following figure plots t p u, as a function of imput- Note that there is a relationship among the 
values of t mput , t ff, and t a . For this plot, a linear relationship is assumed for the values. Their exact 
relationship is determined by the shape of the input waveform, a topic pursued below. 
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Figure 3.12. t » as a Junction oft 



input 



Several interesting observations can be made. When the input is a voltage step, t jf, t a , and ti npux are 
all zero, so t p u,:step = RpuCload, i-e., a simple RC time constant — precisely the prediction made by 
the RSIM model. 
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To sec what happens when the input is not a step, notice that 

tplh < RpuQoad + -~UoJf + t a ) - tmput (3.20) 



since 



[IRpuCloadboJf-ta)} 1 + t a ~ Unput < RpuQoad + -jUoff + 'a) ~ Unput (3.21) 

when tpih > t ff - tmput. (This can be verified by comparing the derivatives of the two sides of the 
inequality or by simply extending the linear portion of the t p ih curves — those portions above the 
dotted line — in the plot above.) Equation 3.20 looks like the response for a step input delayed by an 
amount that depends solely on parameters of the input waveform. 

Figure 3.12 provides some insight into the choice of an appropriate value for t a - From the plot, 
one can see that t p ih eventually goes to zero for some choices of t a , but increases indefinitely for other 
choices. By determining whether tpih goes to zero in an actual circuit, it is possible to narrow the 
range of choices for t a . If the input changes slowly enough, one expects the output voltage to follow 
the voltage transfer curve very closely. (This is essentially the definition of the voltage transfer curve.) 
Thus, when v/„ = v,/, r «/,, it follows that v out = vthmh since v,hresh is the balance point of the inverter. 
This implies t p th - for sufficiently slow input transitions. 

Examining the bottom term of equation 3.19, one can see that t p ih is zero for slow input 
transitions only if t a < UnputA In other words, if t a > tmput, the predicted propagation delay can 
never be zero; the prediction will be longer than the true propagation delay. Thus, it is possible to 
rewrite equation 3.20 using t a = U„p Ut and still preserve the inequality. 

tplh < RpuQoad + j-Uoff ~ Unput) (3.22) 

This equation can be simplified still further with some assumptions about the input waveform. 



fThe bottom term has the form [%\)\ + g(t) which reaches zero for large t only if g(t) is negative. 
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Figure 3.13. Assumed input waveform for low-lo-high output transition 

If the input is a falling voltage ramp which starts at / = and reaches zero at t = 8, then 
Unpui = (1 - vthmh)8 and t ff = (1 - v te )8. Substitution into equation 3.22 yields 



tplh < RpuQoad + ^thresh ~ V, e ) = RpuQoad + (0.15)5 



(3.23) 



where the numerical estimate is computed for a typical 5/i nMOS process. Thus rsim potentially 
underestimates t p ih for a logic gate with a slow input transition (a large 8). As 8 decreases (a faster 
input transition), the accuracy of rsim's predictions increases. Note that Rpu is exactly the resistance 
measured by the experiment proposed in figure 2.16(a). 



3.4.3. High-to-low output transition time, t-ui. 

In the previous section, the equation for t p ih was developed by overestimating the current 
through the pulldown, leading to an upper bound for the low-to-high propagation delay. The same 
technique can be used to estimate t p /,i, the high-to-low transition time. In this case, however, one 
wants to underestimate the pulldown current (and overestimate the pullup current) to find an upper 
bound for t p u-\ 

For the portion of the high-to-low output transition which is of interest (y ou t falling from 1 to 
vthresh), the pullup is in its linear region. As before, i pu can be approximated by the pullup's 
saturation current; an overestimate, but one consistent with the goals of this section. Also as before, 
estimating the pulldown current is difficult. Consider the following diagram of various load lines for 



tMost MOS circuits use multiple-phase clocking, with simple logic circuits between latches controlled by different 
phase clocks. This means that circuit performance is determined by the maximum propagation delay through the 
simple logic: this is the only quantity estimated by RSIM. Other technologies (TTL, ECL) support single-clock, syn- 
chronous designs in which minimum propagation delays can be very important for correct circuit operation. This is 
rare in MOS circuits, and such designs are not .supported by RSIM. 
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the pulldown. The trajectory of a load line shows ipj as a function of time: 
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Figure 3.14. Load lines for the pulldown for various input transitions 

When the input transition is fast in comparison to the output transition, the pulldown turns on to its 
maximum current capacity (the upper load line in figure 3.14). As v out drops, the current in the 
pulldown also decreases, and the trajectory follows the maximum current curve until it reaches v t hmh' 
When the input transition is slow, the output voltage falls fast enough to keep the pulldown and 
pullup currents balanced (the bottom load line in figure 3.14), so the trajectory for ipd follows the ipu 
curve. 

In the proposed approximation, i^ rises linearly to a maximum current equal to the actual 
current through the pulldown when v m = 1 and v^,, = v thresh- This certainly underestimates the 
actual pulldown current for a fast transition, and is roughly equal to the pulldown current for a slow 
transition, except for the last part of the transition. Fortunately, in this portion of the transition (near 
the threshold), a small change in the input voltage causes a large change in the output voltage, so only 
a small amount of time is actually spent in the overestimated part of the transition. This 
approximation leads to the following estimate for iioad'. 
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Figure 3.1S. Estimate ofij^jfor t ^ calculation 
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where t\ is the time at which v/„ = 1 and /'max is the maximum pulldown current minus the pullup 
current. 

'max = Kpdi\ - Vie - '™ ^thresh f~^ v 'd^ < 324 ) 

As before, t a will be chosen to ensure that the estimate is an upper bound to the actual propagation 
delay. 

The derivation of a formula for t p ih and the choice of t a is very similar to that of the previous 
section, so only the conclusion is presented here: 

tphl < RpdQoad + yOl ~ tlnput) (3.25) 

where Rpj = : — — . If the input is a rising voltage ramp that starts at / = and reaches 1 at 

'max 

/ = 8, then 

tphl < RpdQoad + yd ~ thresh) = RpdQoad + (0.28)5 (3.26) 

As before, RSIM potentially underestimates t p hi for a logic gate with a slow input transition (a large 8). 
As 5 decreases (a faster input transition), the accuracy of rsim's predictions increases. Note that the 
experiment proposed in figure 2.16(d) does not measure Rpj. Instead, the experiment measures the 
average resistance associated with the fast input transition shown in figure 3.14, omitting the 
contribution of the pullup. This resistance is less than Rpd, although is it not clear by how much. This 
net result is a tendency to underestimate t p hi by the original rsim model, calibrated as in Appendix 2. 

3.4.4. Why analyzing inverters is sufficient 

The results of sections 3.4.2 and 3.4.3 were developed for the nMOS inverter. This section 
extends the results to nand and nor gates as well. Equations are developed for the amount of 
current flowing through the nor and nand pulldown configurations and then the results are 
compared with the equations for a simple inverter. 
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(a) NOR pulldown configuration (b) NAND pulldown configuration 

Figure 3.16. Currents through NOR and NAND transistor configurations 

The propagation delay of a nor gate with a single active pulldown is exactly that of an inverter. If 
both pulldowns are active simultaneously, i nor = i\ + ii, since the current through each pulldown can 
be computed independently. Thus, when both pulldowns are on, and their gates are at the same 
voltage (Le., logic high), the total current through the pulldowns is 



l nor — 



(*1 + K2Xv« 



Vout » 
V/e =-)v itf 



2 * V ' n " Vte ' 



(linear) 
(saturated) 



027) 



which is equivalent to the current through a single pulldown sized so that 



K single pulldown — *1 + *2 



(3.28) 



As one might expect, this is the formula for combining two conductances in parallel. 

The analysis of a nand gate is more complicated because the currents through the two 
pulldowns are not independent The currents through the pulldowns are given by 



'l = 



Kl(v;« - v m - v, e - XW - v m ) (linear) 






(saturated) 



(3.29) 



<2 = t2(vm - v t e ~ -y^m (linear) 



(3.30) 
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where v m is the voltage of the node that is common to the two pulldowns. Two equations are needed 
for the top pulldown, because the pulldown may be in either its saturated or linear region, depending 
on the relative values of v ; „ and v out . Only one equation is needed for the bottom pulldown, because 
it is assumed that v m is never large enough for the bottom pulldown to become saturated. In the 
steady state i\ must equal ii. This gives a set of equations to solve for v m ; substituting the solution 
into equation 3.29 yields the net current through the pulldown. The result is 



hand — 



-(v/ n ~ v re z-Pout (linear) 



"1 + «2 2 



*1*2 



2(ki + K2) 



(v/n - vk) 2 (saturated) 



(3.31) 



This is the same amount of current as that for a single pulldown sized such that 

Kl ICO 

K single pulldown = ; (3.32) 

K\ + K2 

Again, as one might expect, this is the formula for combining two conductances in series. 

The conclusion to be drawn from equations 3.28 and 3.32 is that the current flowing through a 
parallel or a series configuration of pulldowns can be modeled as the current flowing through a single 
pulldown of the appropriate size. This means that the formulas for the propagation delay through an 
inverter are directly applicable to more complex logic gates. 

3.5. Propagation delay: source-followers and pass transistors 

The analysis which follows is not very rigorous; its purpose is to show that the RSIM models for 
logic gates overestimate the propagation delay through a circuit containing pass transistors and 
source-followers. Although better estimates would be desirable, the existing models are sufficient 
given the relatively constrained use of these components in actual circuits. 

A source-follower (so called because the voltage of the source node "follows" the voltage of the 
gate node) is an n-channcl device with its drain connected to VDD. 
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Figure 3.17. Source-follower circuit configuration 

In the circuit shown in figure 3.17(a), the output voltage of the source-follower cannot rise higher than 
a threshold drop below the voltage of its input Thus, the maximum voltage for the output of a 
source-follower is 1 - v te ; this is why a depletion pullup (which can drive its output to VDD) is 
preferred in an ordinary logic gate. 

Since a source-follower can only pull a node up, only the propagation delay associated with the 
loW-to-high output transition needs to be analyzed. (A rising output transition corresponds to a rising 
input transition; unlike most logic circuits, a source-follower does not invert the sense of its input). 
During a very slow input transition, the output voltage tracks the input voltage, and the propagation 
delay is equal to the time needed for the input to rise from v t hresh to y thresh + v«. For a ramp input, 
this implies t p u, = (v«)fi = (0.14)5 where S is the time needed for the input to rise from to VDD. 

For a fast input transition — one where the input reaches 1 before the output reaches vomsh — 
the current through the source-follower can be approximated as shown in figure 3.17(b). t on is the 
time at which v,„ = \ te , and /i is the time at which v,„ = 1. /"max is estimated by the average current 
flowing through the source-follower during the transition: 



Vthresh 



*sf ,, 
'max r— U ~ ^te - 



^thresh 



(3.33) 



One can calculate t p i n using an approach similar to that of section 3.4.2; the result is 

tplh = RsfQoad + yfrl + Ion) ~ (input 



(3.34) 



^thresh 



where Ry = 

'max 

equation for t p ih is 



If the input is assumed to be a voltage ramp with transit time 8, the final 
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J RtfCtoad + (0.35)5 (small S) 
l P' h = J (0.14)5 (large 5) (135) 

A source-follower is usually used to drive a large output load, so when 5 is small, the RC terra 
dominates. This suggests that the two pieces of the equation can be reconciled as 

tplh = RtfCiov, + (0.14)5 (3.36) 

This equation is very similar to 3.23, which describes t p n, for an ordinary logic gate, so no special 
handling is needed for a source-follower. 

In the analysis of section 3.4 and the first part of this section, each examined device had 
essentially two terminals, since one terminal of each device connected to VDD or GND. Moreover, 
input signals were applied to the gate node of the device. The analysis now turns to circuits that 
contain three- terminal components, ie, pass transistors. A pass transistor is any transistor not 
configured as a pulldown, pullup, or source-follower; some examples of circuits containing pass 
transistors are presented in section 3.2. 

There are two basic configurations for a pass transistor: one with the gate node as input, and the 
source and drain as outputs; the other with the source/drain as input, and the drain/source as output 
(assuming that the gate is at logic highf). As the following table shows, when the gate of a pass 
transistor is the input, the pass transistor behaves like one of the components analyzed earlier. 

pass device acts as analyzed in 



input 


source or 


(gate) 


drain 


falls 


rises 


falls 


falls 


rises 


falls 


rises 


rises 



pulldown turning off section 3.4.2 

enhancement pullup turning off — 

pulldown turning on section 3.4.3 

source-follower beginning of this section 

The second pass transistor configuration presents a new analysis problem. Assume that the drain 
connection is the input (which remains constant) and that the source node undergoes a transition. If 
the drain undergoes a step transition from high to low at time 0, and the source follows, [Horowitz83] 
suggests the best estimate for the voltage of the source is 

vsourceit) = 1 - tanh(- — '- — ) (3.37) 

K pass*-load 



t Although the analysis focuses on n-channcl pass transistors, it can be extended to p-channel pass transistors in a 
straightforward manner. 
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This equation can rearranged to give the propagation delay: 

tphl = Rpass Cload tanh _1 (l - Vthresh) = (063)R pass Cload 



(3.38) 



Similarly, Horowitz suggests the best estimate for the voltage at the source, given a rising step at the 
drain, is 



v source(0 — 1 



1 



(3.39) 



*pass 



Cload 



+ 1 



which gives 



Vthresh 



tplh = Rpass Cload-, = (0.19)Rpass Cload 

^ 1 - Vthresh 



(3.40) 



In both cases, the RC time constant of the rsim model overestimates the propagation delay of a step 
input For a slow input transition, the source voltage tracks the drain voltage, resulting in essentially 
zero propagation delay. (In this respect, the delay through a pass transistor is similar to the delay 
through a logic gate.) Although no direct evidence is provided here, the circumstantial evidence 
indicates that the predictions for propagation delay through a logic gate are upper bounds for the 
propagation delay through a pass transistor, regardless of the speed of the input transition. 

Pass transistors are often used in series within a switching-logic implementation of multiplexors, 
etc. 



B 

i 



c 

i 



T T 



D 



input o ' R1 ' | ■ rj ' | R3 T R4 

-^Cl =J=C2 - L -~ 



a =pc4 
Figure 3.18. Pass transistors connected in series 



Horowitz extends his estimates for the voltage of a particular node e to a chain of pass transistors by 
replacing the RC terms in equations 3.38 and 3.40 with 



TDe = ^,RkeCk 

k 



(3.41) 



where Rke is the resistance of the path common to node e and node k. Thus, his estimate for the 
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delay associated with a falling transition on node D of figure 3.18 is 

tpu = (0.63)[ RiCi + (Ri+R 2 )C 2 + (Ri + Ri+RJKi + (Ri+R 2 +R3+Ra)C4] (3.42) 

If all the resistances are equal, and all the capacitances are equal, tpu = 63RC . The RSIM estimate 
for the same transition is 

tphl = (2>*X2C*) = 16RC (3.43) 

which overestimates the delay by a considerable margin. For a long chain of pass transistors, the RSIM 
estimate is very pessimistic; fortunately, performance constraints limit designers to chains of length 
four or less. Nevertheless, performance prediction for a circuit containing pass transistors is clearly an 
area where rsim can be improved, f 

3.6. Implications for the RSIM model 

The analysis of the propagation delay of logic gates indicated that an RC time constant is a very 
good estimate for the delay of a gate when the input waveform is a voltage step. The analysis 
concludes that a simple RC time constant underestimates the actual propagation delay if the input 
waveform is assumed to be a voltage ramp with a rise/fall time of 5. More accurate estimates for the 
propagation delays are 

tplh < RpuChMd + bin:Jall 

IpM ^ RpdQoad + hm:me * " ' 

where 

&in:Jall = -^iythresh ~ v le )S « (0.15)5 

1 (3.45) 

bin-.rise = yd - Vthresh)8 * (0.28)5 

are offsets that depend only on parameters of the input waveform. Section 3.5 shows that these 
equations are satisfactory upper bounds on the propagation delay through other (non-gate) circuit 
configurations. 



tit is straightforward modification of RSIM to make it use equations 3.38 and 3.40 instead of the lumped RC formu- 
la. However, these equations only apply to circuits containing a single driver; until the theory is extended to include 
multiple-driver configurations, it seems safest to use the conservative lumped RC approximation. 
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The computation of the propagation delay would be easier if it involved only the t of the output 
node. A rearrangement of the time accounting accomplishes this: 

(1) Report the time of the output transition as happening at t time units after the 
input transition. 

(2) Schedule the event associated with the output transition for t + A time units 
after the input transition where A = (0.28X/ota/ rise time) for rising transitions, 
and A = (Q.15)(totaI fall time) for falling transitions. 

In other words, the effects of the input rise/fall time are factored in when the input transition is 

scheduled, so the A terms in equation 3.44 can be omitted when computing subsequent t's. This 

rearrangement is illustrated in the following figure. 



IN 



IN 



-^rc + a^I*- ""^^J rc •*" 

OUT I 



OUT 

(a) according to equation 3.44 (b) proposed rearrangement 



Figure 3.19. Rearrangement of time accounting for transitions 

The total rise and fall times of a transition are related to the RC time constant of the transition. When 
the input is modeled as a ramp, the total rise/fall time is (2.3)t since t is measured using v t hresh =0.44. 
As a result, the transitions of a given node can be handled in the following way: 

(1) Compute the RC time constant (t) for the node. 

(2) Report the time of the transition as r time units in die future. 

(3) Schedule the associated event at 

(1.6)t time units in the future for a rising transition, or 
(1.3)t time units in the future for a falling transition. 

Note that 1.6 = 1 + (2.3X0.28) and 1.3 = 1 + (2.3X0.15). This scenario assumes that all 

consequences of a rising transition involve a falling transition, and vice versa. This is not always the 

case for a source-follower or a pass transistor, but the error involved (the difference between 1.6 and 

1.3) is not large enough to be significant. The old scheme (accounting for the input transition time 

during each delay computation) can be used if desired. 

Now that the model incorporates some information about the input waveform, it is interesting to 
review the examples presented in section 2.4. First the PLA calculations: 
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As one can see, rsim's estimates are now better, and they overestimate transition times with reasonable 
consistency. (One expects overestimates because of the inequality in equation 3.44). The estimate for 
Case 1 is 11% greater man the spice prediction; for Case 2, 2% greater. The story is similar for the 
OM2 data path example: 
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rsim's prediction is 15% greater than that of spice Note that the event for node B is scheduled using 
the rule for a rising transition — formulated assuming that any consequent transitions will be falling — 
even though node C is also undergoing a rising transition. This accounts for much of the overestimate 
by rsim. 

In conclusion, this chapter shows justification for the linear transistor model, especially if all 
waveforms can be modeled as steps. Of course, transitions are not steps in actual circuit operation; 
this fact motivated changes to the linear model, still allowing it to provide acceptable predictions of 
circuit behavior. 
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CHAPTER FOUR 



Simulation Using a Linear Network Model 



This chapter focuses on various rsim implementation issues. The first section presents a detailed 
description of the simulation algorithm, with step-by-step accounts of the charge-sharing and final- 
value computations. Several techniques for speeding up the computations are described in the second 
section. The third section outlines some mechanisms available to the user for forcing the value and 
timing predictions for given nodes. The chapter concludes with an evaluation of the strengths and 
weaknesses of RSIM. 

4.1. The RSIM simulation algorithm 

RSIM uses the following simple recipe for simulating a circuit: 

(i) Accept new input values from the user. Perform the new-value computation 
(figure 4.2) for each new input value; this propagates the new value to nodes 
connected to the input by the source/drain connection of a transistor switch (see 
figure 2.14(a)). In addition, schedule the appropriate event so that any 
transistors affected by the new input value will be processed. 

(ii) Process events from the event list, stopping (1) when the event list is empty, (2) 
when a node the user is tracing changes value, or (3) when the specified amount 
of simulated time has elapsed. 

(iii) Loop back to (i) to accept new inputs. 

The main loop of the simulator (step (ii) above) is described in the following figure. The node 
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associated with each event is assigned its new value, and all stages affected by the new value are 
located and processed. (An affected stage is one that contains a source/drain node — called a seed 
node — of a transistor which has the event node as their gate.) The processing of a stage has two 
steps: first a charge-sharing computation for the stage, then a calculation of the final value of each 
node in the stage. Before each of the two steps, the COMPUTE flag of each seed node is set to indicate 
that the stage containing the seed node needs processing. A stage is processed only if its seed node 
has the compute flag set; as part of the processing, COMPUTE flags for nodes in the current stage are 
reset This mechanism ensures that a stage is processed only once, even if it contains more than one 
seed node. 

while event list not empty { 

n : = node associated with first event on event list 

remove first event from event list 

set n's value to the value specified by the event 

/* do charge-sharing compulation for each affected stage [see section 4. 1.1 J */ 
for each transistor with n as gate node, set COMPUTE flag for source 
for each transistor t with n as gate node 

if t has just turned on and compute still set for source node 
do charge-sharing computation for source 

/* do new-value computation for each affected stage [see figure 4.2] */ 

for each transistor with n as gate node, set compute flag for source and drain 

for each transistor with n as gate node { 

if COMPUTE still set for source, do new-value computation for stage containing source 
if COMPUTE still set for drain, do new-value computation for stage containing drain 

} 
} 

Figure 4.1. Main loop ofRSIM algorithm 

Note that the charge-sharing computation deals only with the source stage of each transistor, but the 
final-value computation deals with both the source and drain stages. This is because the charge- 
sharing calculation only deals with transistors known to be on; therefore, the source and drain belong 
to the same stage, and a stage computation involving the source automatically involves the drain. 

The procedure for calculating the final value for each node in a stage is outlined in the following 
figure. 
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initializc connection list to have starting node as only element 

set pointer to beginning of connection list 

if starting node is an input, input_ found : = true, else input_ found : = false 

/* find all nodes in current stage */ 
while pointer not at end of connection list { 
n : = node currently pointed at 
for each "on" transistor with source connected to n { 
if drain is an input, input_found : = true 
else if drain not on connection list, add drain to end of list 

} 

advance pointer to next list element 

} 

/* compute new final value for each node in stage */ 

if no inputs found, all done (charge-sharing has computed the correct value) 

else for each node on connection hst { 

if node is an input, do nothing (its value is set by user) 

compute final value for node {section 4.1.2] 

reset visited flag (set by final-value computation) for each node on connection list 

reset node's compute flag 



} 



Figure 42. Subroutine to compute new final value for every node in stage 



The details of the charge-sharing and final-value computations are presented in the next two 
subsections, followed by a description of event management in RSIM. 

4.1.1. Charge-sharing computation 

When a transistor turns on, its source and drain nodes become part of the same stage. As 
explained in section 2.2, if the voltages of all the nodes in a stage are not already identical, they 
become so through charge sharing. In order to calculate the charge-sharing value for each node, RSIM 
computes three summary capacitances from the capacitances of each node in the stage: 

Chtgh total capacitance of nodes with current state of logic high. 

C/ ow total capacitance of nodes with current state of logic low. 

C x total capacitance of nodes with current state of X. 
The summary capacitances are used to compute the charge-sharing value for the stage, as specified by 
equations 2.3 and 2.4: 



charge- sharing value = 



Chigh + c x 



Cbw + Chigh + C x 



1 r + C r S " +r > v *** < 41 > 

(-low + t. high + C x 
X otherwise 
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An event is scheduled for each node, specifying an immediate transition to the charge-sharing value. 
(See section 4.1.3 to find out what happens to new events.) 

ITie charge-sharing computation is outlined in the following figure. The procedure performs a 
tree walk of a stage starting with a node passed as an argument from the new-value procedure. Since 
the nodes in the stage do not require processing in a particular order, the procedure is implemented 
without recursion. 

initialize list to have starting node as only element 
set pointer to beginning of list 
reset capacitance accumulators 

/* visit all nodes in stage, compute summary capacitances */ 
while pointer not at end of list { 
n : = node currently pointed at 
add capacitance of n to appropriate accumulator 
for each "on" transistor t with source connected to n { 
if drain is an input or statk(t) > maxres, do nothing 
else if drain not on list, add drain to end of list 

} 

advance pointer to next list element 

} 

/* make each node in stage have charge-sharing value */ 
compute charge-sharing value using equation 4.1 
for each node on list { 

reset node's COMPUTE flag 

schedule immediate transition to charge sharing value 
} 

Figure 43. Non-recursive routine for charge-sharing computation 

If the resistance of a transistor is large enough, its source and drain nodes might not share charge — at 
least not very quickly. The user can specify a maximum resistance parameter (maxres) that controls 
the scope of the charge-sharing calculation; the traversal of nodes in a stage stops at transistors with a 
resistance greater than maxres. The compute flag indicates to the main RSIM loop which stages have 
been processed by the charge-sharing calculation; the main loop uses the flag to ensure that the 
charge-sharing calculation is performed only once for each stage. 

Equation 4.1 leads to incorrect results when the surrounding network contains X transistors 
(transistors with gates of X). A portion of the network that can be reached only through X transistors 
might not be connected to the original node at all, and so should not make an active contribution to 
the node's charge-sharing value. An alternative (suggested by Dave Gross) is the use of capacitance 
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intcrvals to accumulate the contribution of X connections. In this scheme, the capacitance 
accumulators have interval values, e.g., Chigh = [Ci,i g h-min, Chjgh-max]. The minimum value is the 
total capacitance of nodes guaranteed to be connected to the current node; the maximum value also 
includes the capacitance of nodes only reachable by X transistors. A separate charge-sharing 
computation occurs for each node in the stage, as outlined in the following figure. 

if node is input, Chigh - Ci ow = C x = [0,0] 
else { 

\oca\_Chigh := \ocd\_Cfow '■= local_C x := {0,0] 
add node's capacitance to max and min of accumulator for node's value 
set visited flag for current node 

for each "on" transistor, t, with source connected to current node { 
if drain does not have visited flag set { 

recursively determine parameters for drain node 
if value of gate node for t is not X { 

local_Chigh -min : = \ocal_Chigh .min + Chigh Jnhi 
local_C/ ow ,.min := local_C/ OH ,.min + Ci ow .min 
local (7* .min := local C x .min + C^Jnin 
} 

\ocai_Chigh .max := local_Oi#,.max + Cw^.max 
local_C7 olv .max := local_C/ ow jnax + C/ 0M -.max 
local_C x .max := local C x .max + C x .max 
} 
} 
set Chigh — local_Chjgh . and so on 

Figure 4.4. Subroutine to compute capacitance intervals 

The results determine the maximum and minimum node voltage, which determine the charge-sharing 
value for the node: 



charge-sharing value = 







Chigh -max + C x .max 
Ci ow .min + Chigh -min + C x .min 



, Chigh. nun ^ ,. -x 

Ci ow .max + Chigh.max + C x .max * 

X otherwise 



Capacitances for nodes connected by X transistors contribute to the final value only in a negative 
sense, Le., they may cause a node to go to X, but never contribute to a value of or 1. Leaving the 
visited flag set as each new node is discovered ensures that each node is visited only once. After 
completing the charge-sharing computation for a node, its COMPUTE flag is reset; the visited flags for 
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all nodes in the stage are also reset, in preparation for the next node's computation. 

One disadvantage of the interval approach is that a separate calculation is performed for each 
node in the stage, whereas the original scheme required only one calculation per stage. In addition, 
the interval calculation must be performed by a recursive tree walk to ensure the correct handling" of X 
transistors. Fortunately, this computation can be merged with the tree walk described in the following 
section, so the incremental cost is fairly small. 

4.1.2. Final-value computation 

The final, driven value of a node is determined by the resistance of paths from the node to 
various inputs. As we saw in chapter 2, a convenient way to characterize these paths is to calculate the 
Thevenin equivalent for the portion of the network that can be reached from the node of interest 
Equation 2.6 relates the final value of a node to V t hev, the Thevenin equivalent voltage. The time 
constant for a transition in the value of a node is also determined by the surrounding network; the 
necessary parameters can be computed during the Thevenin calculation. 

For computational convenience, RSIM actually computes RH and RL, the resistances of a resistor 
divider that represents the effect of the surrounding network. 

[RH RH 1^ ^ — net resistance of all paths to VDD 
1 Lj.RL^^ z — net resistance of all paths to GND 

Figure 4.5. characteristic resistor divider for a node 

RH and RL might be resistance intervals (RH - \RH U RHh] and RL = [RLi, RLh\) if there are X 
values in the surrounding network. The Thevenin equivalent voltage is easily calculated from the 
characteristic divider: 

^■^'• F »i"'izjrW-«^WJ <4 - 3) 

For example, the lowest possible voltage is calculated using the least resistance to GND (specified by 
RLi) and the greatest resistance to VDD (specified by RH/,). Couching the computation in terms of 
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the characteristic resistance is advantageous for several reasons. Resistances to vdd and GND 
represent, in a natural way, the connections made by MOS logic, as shown in chapter 3. With the aid 
of some simple rules, it is easy to incrementally analyze any MOS network in terms of its component 
resistances. Because resistances are directly related to the implementation, they can represent certain 
circuit configurations — eg., short circuits (RH = RL = 0) — that cannot be simply characterized 
using the Thevenin equivalent The remainder of the section describes a tree walk algorithm to 
compute the parameters needed for determining a node's value and for scheduling the appropriate 
transition. 

The computation of RH and RL proceeds by tracing paths to the inputs that are reachable from 
the node of interest, and then calculating the resistance of each path, starting at the input and working 
back toward the original node. Two rules are helpful for calculating path resistance. The first rule 
specifies the apparent path resistances when a divider exists on the other side of a resistor: 





w > w lA),Ahl 

X — /y\y\y — © 



(a) initial network (b) approximation 

Figure 4.6. Reduction rule for resistor divider with series resistor 

The parameters for the apparent resistances (A and B in figure 4.6(b)) cannot be determined exactly, 
an approximation is therefore necessary. Appendix 3 explains why this is so, and derives the following 
formulas for the approximation: 

Ai = Pi + Ri + Ri%- A h = P h + Ri^~ + Ri%- 
Qi Pi Qi 

Bj = Q l + R l + Ri^- B h =Q h + R,%- + R t <k 

n Qi Pi 

The second rule is much simpler; it indicates how to merge the resistances of two separate paths to 
obtain the net resistance for both paths: 
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[C.D] 



[A H P. B || Q] 
[R.S] [C || R. D || S] 




(a) dividers for two parallel paths 



(b) resulting divider 



Figure 4.7. Reduction rule for combining two parallel paths 

To compute the Thevenin equivalent for a particular node, one starts by locating all conducting 
transistors connected to that node and then recursively analyzing the network on the other side of 
each of the transistors. Each node is marked as its analysis begins; recursive calls ignore portions of 
the network involving marked nodes. This keeps the analysis expanding outward, eventually 
terminating at a dead-end (no paths leading to unmarked nodes) or an input These particular circuits 
are easy to analyze, as shown in the following figure. 



RH = oo 



RL = 




RH =0 



RL= 00 




RH = 00 



RL= 00 




(a) low input (GND) (b) high input (VDD) (c) dead-end 

Figure 4.8. Characteristic dividers for input nodes and dead-ends 

The resistance of paths leading from a particular node are combined using the two reduction rules 
above. Using the first rule, the results of a recursive call (shown as P and Q in figure 4.6) are 
combined with the resistance of the conducting transistor leading to that piece of the network (shown 
as R), to yield the net resistance of the path. This resistance is combined with the resistances from 
other recursive calls using the second reduction rule. When all paths have been accounted for, the 
analysis for the node is complete. The resulting divider is the desired answer, or, is used as part of the 
analysis of some other node if the analysis was performed because of a recursive call. The process is 
diagramed in die following figure. 
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subnet 



C subnet J — W\r 
C subnet J — ^M — 



(a) initial network 



t — Wv — 
t — W\ — 

|-^AAr— 



(b) after recursive analysis of subnets 




(c) after applying first reduction rule (d) after applying second reduction rule 

Figure 4.9. Network analysis by repeated rule application 

The complete analysis procedure is outlined in the next figure. The results are stored in eight 
global variables: 

RH resistance interval for net resistance of all paths to vdd. Path resistance 
computed using static resistance of each transistor. 

RL resistance interval for net resistance of all paths to gnd. Path resistance 
computed using static resistance of each transistor. 

R V dd net resistance to vdd, computed using the dynamic-high resistance of each 
transistor. Simple series/parallel calculation; paths containing X transistors 
are ignored. 

R gm j net resistance to gnd, computed using the dynamic-low resistance of each 
transistor. Simple series/parallel calculation; paths containing X transistors 
are ignored. 

R x net resistance to all inputs, computed using the dynamic-high resistance to 
high inputs, and dynamic-low resistance to low inputs. Simple series/parallel 
calculation; includes paths containing X transistors. 

Chigh total capacitance of nodes with current state of logic high. 

Ci ow total capacitance of nodes with current state of logic low. 

C x total capacitance of nodes with current state of X. 
If the interval charge-sharing calculation is merged with this calculation, the upper limit of the 
capacitance intervals in the charge-sharing calculation can be used in place of the three capacitance 
accumulators just defined. The procedure also uses four stack-allocated local variables to accumulate 
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the first four quantities listed above, during the calculation for each node. 



Rend — Ry — 



if node is logic low input { 

return with RH = R^ = co and RL 
} else if node is logic high input { 

return with RH = R^u = R x = and RL = R gmi = oo 
} else { 

local_/f v jd : = \oca\_R g „d : = \oca\_R x : = local_RH : = local_RL : = oo 
add node capacitance to appropriate accumulator 
set VISITED flag for current node 

for each "on" transistor, t, with source connected to current node { 
if drain does not have VISITED flag set { 

recursively determine parameters for drain node 

combine statist) with RH and RL using first reduction rule 

combine result with local_RH and local_RL using second reduction rule 

if value of gate node for t != X { 

local_/{ W 4/ : = \oca\_Rydd || (dynhigh(t) + /?«*/) 
\ocd\_Rg„<t : = \ocal_Rg rU i || (dynlow(t) + R gn d) 

local_rt x := local_/? x || (min(dynhigh(t),dynlow(t)) + R x ) 
} 
} 

set Rvdd = local_#vdtf, RH = local RH, and so on 
} 

Figure 4.10. Subroutine to compute parameters of resistor divider 

Marking each node as it is visited (by setting its VISITED flag) avoids cycles and keeps the tree walk 
expanding outward from the starting node. If the network does contain cycles, the subroutine only 
approximates the true resistance to vdd and GND. For example, consider the following logic gate 
where the output (the pulled-up node) is the node of interest: 






(a) circuit containing cycles (b) circuit as analyzed 



(c) circuit as analyzed if marks removed 



Figure 4.11. Analysis of circuit containing cycles 



Since the marks are not removed when the analysis of a path is completed, RSIM treats the cycle as if 
the circuit were configured as shown in the circuit in figure 4.11(b). This approximation results in an 
overestimate of the actual resistances. If a node's mark were removed as the procedure exited, all 
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paths through the network would be explored (as shown in figure 4.11(c)); in this case, the resistance 
would be underestimated, leading to optimistic performance predictions. 

Cycles arc relatively rare in nMOS designs; when they occur, the extra path is often redundant, 
ie„ the circuit is designed to work correctly if any path in the cycle became the sole connection. ITiis 
means the approximation used by rsim is usually not out of line with the designer's intentions. This 
statement holds for CMOS as well, with one notable exception — the CMOS pass gate: 

Abar 



A 
Figure 4.12. A cMOS pass gale 

In this circuit configuration, one device is sized to carry most of the load, and the other exists simply to 
ensure no threshold drop across the gate. In analyzing such a circuit, RSIM arbitrarily chooses the 
transistor that makes the connection; the other transistor's contribution is ignored. This is satisfactory 
if the transistor with the smaller resistance is chosen, but such is not always the case. To correct the 
problem, the transistor list for each node can be arranged in order of increasing resistance; this ensures 
paths of least resistance are examined and marked first Note that this solution only works when the 
paths in a cycle have a length of one transistor (as in the pass gate above). If the paths are longer, 
there is no guarantee that the path of least total resistance will happen to start with the transistor that 
has the least resistance. 

After the various parameters are calculated, the final value of a node can be calculated using 
equations 2.6 and 4.1: 



final value 



V h < vi ow or (old value=0 and RHi = 00) 

1 Vi > v high or (old value=l and RLi = oo) (4.5) 
X otherwise 



The extra clause for "0" and "1" values prevents a node from being unnecessarily forced to X when it 
has no connection to inputs of the opposite logic state. The appropriate event is scheduled ReffC e ff 
seconds in the future, where 



Reff = 



C eff = 
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Rgnd find value = 

Rvdd final value = 1 (4.6) 

R x final value = X 

Chigh + C x final value — 

Clow + C x final value = 1 (4.7) 

Clow + Chigh final value - X 



The disposition of this event depends on the nature of any pending events and the node's current 
value; see section 4.1.3 for the details of event management 

The user has some control over the final-value computation. The time constant for event 
scheduling can be forced to 1, implementing a unit-delay simulation. This is useful when a node value 
is to be calculated using transistor resistances, but transition timing is not important. Another option is 
flagging those events corresponding to transitions to X, where the X value is specifically caused by a 
ratio error (rather than other X's in the network). Such transitions are characterized by RHh < °0 
and RLh < CO; if an X exists in the surrounding network, one or both of these parameters is infinite. 
When a flagged event is processed, the transition is reported to the user as a ratio error. Because the 
error report is delayed until the flagged event is processed, short-lived ratio errors (those caused by 
small differences in propagation delays) are ignored, and the error reports reflect only significant ratio 
errors. Of course, in some designs, even long-lived ratio errors might not affect correct circuit 
operation, so the reporting is optional. 

When RHi - RLi = co, the node is not connected to any inputs, and the charge-sharing 
computation described in the previous section correctly computes the node's final value. Ordinarily, 
the final-value calculation does not schedule any events in this case, but the user can optionally 
request the scheduling of a charge-decay event A charge-decay event sets the node value to X after a 
specified interval which the user can set At first glance, it might seem odd to schedule all decay 
events using the same interval; a more suitable estimate might be based on factors such as the node's 
capacitance, the number of transistors connected to the node, and so on. However, precise predictions 
are not necessarily the most useful here. The actual decay time for MOS circuits is in the millisecond 
range. Since it is unlikely that a simulation spans that long a period of simulated time, a precise 
accounting of the decay time never results in a decay! A more useful approach is based on the 
observation that a designer usually intends for all dynamic nodes to be refreshed every few clock 
cycles. When the decay time is set to an interval slightly larger than the intended refresh rate, the 
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unrcfrcshcd nodes decay quickly, and the user receives a suitable error report. Thus, even a short 
simulation run catches a decay problem. This type of debugging experiment can be much more 
effective than a precise estimate in pinpointing a problem. 

4.1.3. Event Management 

Up to two events can be pending for a node: 

(1) a charge-sharing (CS) event CS events are always immediate events, ie, they 
are scheduled for the current simulated time. 

(2) a final-value (FV) event, scheduled for sometime in the future. 

Thus, up to two transitions are possible for a given node. Each event corresponds to a real transition, 
ie, the new value of a CS event always differs from the current value of the node, and the new value 
of a FV event differs from that of the CS event (or the current node value if there is no pending CS 
event). Since only two transitions can be pending at any moment, newly calculated events must be 
merged with the pending events. Section 2.3 hinted at the issues involved; in general, rsim makes its 
choices based on the principle that the most recently calculated event best reflects the current network 
configuration. Since no information is available that explains why any pending events were created, 
there is little (if any) reason to save a previously-calculated event in preference to the newer one. 

The following figure describes the simple merging rules used by rsim: 

if merging new CS event { 

abort pending CS and FV events 

if new charge-sharing value is different from current node value 
schedule new CS event 

} 

if merging new FV event { 

if new value differs from CS value (or, if no CS event pending, current node value) 
schedule new FV event 
} 

Figure 4.13. Merging a new event with pending events 

A new CS event aborts a pending FV event because a new final-value computation always occurs after 
the charge-sharing computations are complete. Although this approach is simple, it occasionally leads 
to pessimistic predictions. For example, if one input of a two-input nor gate turns on substantially 
before the other, the propagation delay is actually determined by the time of the first input's 
transition. With the merging scheme outlined above, the two events scheduled at the time the second 
input turns on cause other events to be aborted — those scheduled because of the first input's 



74- 



transition. This occurs even if one of the aborted events is scheduled for an earlier time than the 
second event. In other words, with the merging scheme above, the propagation delay of a NOR gate 
might be incorrectly measured from the later input There is no simple fix to the merging rules above 
that solves this problem. The correct solution requires knowledge of both the new CS event and the 
new FV event, so that pending events can be saved if they are compatible with both newer events. If 
the charge-sharing and final-value calculations are merged, as suggested at the end of section 4.1.1, it 
should be straightforward to implement the correct merging scheme. 

There are several alternatives for dealing with aborted events. The simplest approach is to 
handle the event as if it were never scheduled, Le., do nothing. This is the approach rsim adopts. 
Another approach is motivated by the physical significance of an aborted event. Since the signal 
changes between the transition start time (the time when the charge-sharing or final-value computation 
was performed) and the transition end time (the scheduled time of the event), the action of aborting 
the event corresponds to a stop in mid-transition. Aborted transitions are termed glitches 
[Thompson74]; these malformed signals sometimes have significant impact on the operation of a circuit 
and should be reported to the user. This report can be in the form of a forced transition to X, or just 
a simple error message. Interestingly, a user who has the option to receive glitch reports almost always 
disables that feature [Ulrich73]. The reason given is that the duration of an aborted transition is 
usually short enough so that the actual signal does not change significantly; hence no glitch actually 
occurs.f 

Scheduling an event entails inserting it into the event list placed according to its scheduled time. 
An event list implemented as a simple list would impose a noticeable scheduling overhead. RSIM 
adopts several techniques for reducing this overhead. It quantizes simulated time, and rounds off each 
event time to the nearest time quanta; in the current implementation, the time quanta is 0.1 
nanosecond. The event list is implemented in two pieces: 

(1) an event array. Each array element is a doubly-linked list of events for a 
particular time quanta. 

(2) an overflow list, a doubly-linked list of events, sorted by event time. 

This organization is similar to that found in many conventional gate-level simulators [Vaucher75, 



fSome researchers propose showing transitions between logic states as 0-X-1 or 1-X-O, where the initial transition to 
X happens immediately. Thus aborted events leave the node value at X until some subsequent event re-establishes a 
legitimate logic state. This suggestion doubles the number of events in a simulation; a cost which might outweigh the 
advantages. 



-75- 



Ulrich76]. The event lists are doubly-linked to allow quick removal of an aborted event from the list. 
The data structures are diagramed in the following figure. 
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Figure 4.14. The event list is implemented with an event array and overflow list 

The event array is managed as a circular buffer in which the N array elements hold events for the next 
N time quanta. An array index indicates which array element corresponds to the current simulated 
time. If a new event is scheduled for a time M quanta in the future, where M<N, the event is added 
to the end of the event list stored in array element (index + M) mod N; no sorting or searching is 
required. If M>N, the event is inserted into the overflow list according to its scheduled time. The 
array size is chosen so that most events are scheduled directly into the array. With a time quanta of 
0.1 nanoseconds, a 128- or 256-element array captures most events in modern MOS designs. Note that 
events are added to the end of an event list This ensures that events are processed in first-in, first-out 
order, Le„ in the order created. Thus, cause-and-effect relationships are preserved. 

To find the next event to process, the event array is searched starting at the current index, until 
an event is found. Each increment of the index corresponds to advancing simulated time by one time 
quanta. If the array is empty, simulated time is advanced to equal the scheduled time of the first 
event on the overflow list; this event becomes the next one to processed. When an event is located 
for processing, the overflow list is examined to find events whose scheduled times are less than N time 
quanta away from the new simulated time. Such events are moved from the overflow list to the 
appropriate list in the event array. This preserves the first-in, first-out event ordering mentioned 
above. 
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4.2. Speeding up the simulation 

No simulator is fast enough. Increased simulator performance is always in demand, either to 
achieve faster turnaround during the design process, or to allow more complete testing during 
verification. This section discusses several techniques for improving the performance of the algorithms 
presented in the previous section. 

It is not surprising to learn that, during event processing, most of the time is spent in the final- 
value calculation.! To compute the final value for a given node, the final-value computation must visit 
all the nodes in the current stage. Thus, if there are n nodes in the stage, processing the entire stage 
takes 0(n 2 ) time. Since the remainder of the processing is proportional to the size of the stage, the 
real bottleneck is the final-value computation. Performance can be improved by 

(1) introducing a cache for final-value computations, with the intent of eliminating 
the recalculation of parameters for subnetworks. 

(2) reducing the number of nodes in the stage. 

(3) reducing the cost of each calculation, for example, by substituting integer 
arithmetic for floating-point. This alternative will not be discussed further, 
except to note that a 32-bit integer has over 9 orders of magnitude of dynamic 
range, sufficient for representing mos resistances. 

Clearly, the first improvement is most significant when n is large. The third improvement is important 

when n is small and the dominant cost is the actual arithmetic. The second improvement works on 

making (3) more important than (1). The improvements are discussed in turn below. 

As it is currently formulated, the final-value procedure performs many redundant computations. 
Consider the circuit diagram for a 5-node stage shown in (a) below, and one of its subcircuits, shown 
in (b) below. 



fThe discussion in this section is limited to that portion of the simulator which propagates new values through the 
network. RSIM has an interpreted LISP-like command language which the designer uses to prepare new input 
values and process the results of a simulation step. Depending on the sophistication of the simulation environment 
built by the user, a substantial portion of the total time can be spent in the command language interpreter. Of 
course, there is room for improvement here too. but that is outside the scope of this thesis. 
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(a) 5-node sage (b) example subcircuit 

Figure 4.15. Stage containing 5 nodes and 4 transistors 

When one traces the computations performed by the final-value procedure (see figure 4.10), it 
becomes apparent that the parameters for a specific subcircuit are calculated several times. The 
computations for nodes A, B, and C all need the same information about the subcircuit in figure 
4.15(b); there is no reason to compute the information more than once. 

The amount of redundant computation can be reduced by caching the result from each call to 
the final-value procedure, f Before each call, the cache is searched to see if the subcircuit was analyzed 
previously; if so, the results are taken from the cache and not recomputed. If the cache has constant 
access time, the cost of the final-value analysis for a stage is reduced to 0(n), a significant saving 
when n is large. In RSIM, the cache does not need to accommodate arbitrary amounts of information; 
associating two cache entries with each transistor (one for the source, one for the drain) is sufficient 
The source cache retains the network parameters for the subnetwork connected to the drain node 
(including the transistor), and the drain cache is similar. When the analysis of a subnetwork is 
completed, the result is placed in the appropriate cache. 



( subnet #1 V^J U-( subnet #2 J ( subnet #1 Vr^ Ur 

source cache drain cache source cached filled 

(a) circuit showing caches (b) circuit after analysis of subnet #2 



Figure 4.16. Transistor cache scheme 

In the figure above, once subnet #2 has been analyzed and the result saved in the source cache, 
subsequent analyses involving the same transistor and subnet use the cached result The following 
tThis caching technique is known in the LISP community as memoization. 
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figure shows the cache status after calculation of the final value for node D of figure 4.15(a). 



D 



d %T5 



Wu T5 



^ 



Figure 4.17. Cache status after final-value calculation for node D 

Subsequent analysis of node C, for example, requires only a single recursive call (rather than four as 
before). 

There are several reasons why the transistor cache might not be the ideal solution. The amount 
of information in each cache entry — 8 parameters — is quite large compared to the transistor data 
base. This suggests that cache entries should be dynamically allocated when needed, and returned 
when the computation is complete. The combined costs of storage management and cache access 
might exceed the cost savings realized on stages of modest size. These objections can be addressed by 
associating cache entries with nodes instead, or using the cache only when the stage exceeds a 
specified size. 

However the cache is organized, its introduction has a substantial impact on the amount of 
computation required for the final-value analysis of a stage. Another improvement mentioned at the 
beginning of the section is reducing the number of nodes in a stage. The key element of this is the 
notion of useless nodes, Le., nodes that do not connect to any transistor gates and hence whose values 
are irrelevant Such nodes commonly occur in a pulldown path containing more than one transistor, 
such as the node marked by an asterisk in figure 4.18(a). 
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(a) nMOS logic gate 



(b) pulldown after removing useless node 



Figure 4.18. Removing useless nodes from a stage 



Section 3.4.4 mentions that a pulldown with more than one transistor is electrically equivalent to a 
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single-transistor pulldown of the appropriate size. This suggests that such a pulldown can be replaced 
by a circuit like the one shown in figure 4.18(b). All the nodes in the pulldown except the output and 
GND are eliminated, and all the pulldown transistors are replaced by a single transistor. The gate value 
of the single transistor is the logical conjunction of the values of the gates of the original pulldown 
chain. In fact, RS1M uses a compact representation for the generalized MOS gate: 
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Figure 4.19. Efficient internal representation of an nMOS logic gate 



All transistors and nodes that make up the gate are eliminated, and the resulting gate structure is 
associated with the output node. The output can still connect to other transistors that are not 
recognized as part of a logic gate; only those transistors that implement a mos logic gate are 
compressed. The resistance parameters of a gate structure are computed very efficiently by RSIM — 
many times more quickly than the analysis of the equivalent network. 

The compression of gate circuits into the compact internal representation also results in a 
considerable space saving. Somewhere between 40% and 80% of the transistors in most circuits are 
eliminated when the gate structures arc built. This resulting simulation runs roughly twice as fast as 
the uncompressed network. This optimization is probably the single largest contributor to the ability 
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of rsim to deal with very large mos circuits. 

43. Escape mechanisms 

Previous sections of this chapter introduced mechanisms that allow the user to adjust the 
operation of the simulator as a whole. There are occasions, however, when a large-scale adjustment is 
inappropriate, and only the predictions for a single node need correction. This section discusses 
several "escape" mechanisms provided by rsim for adjusting the predictions for small groups of nodes 
and transistors. 

The modifications discussed here are ad hoc in nature; their motivation arises from purely 
practical considerations. The mechanisms are not intended to allow wholesale changes in the 
simulation computation, but are provided so the designer can correct particularly egregious or far- 
reaching errors in the simulation of specific circuits. Since the mechanisms treat the symptoms and not 
the disease, their effectiveness is limited to local improvements. 

The are four user-adjustable parameters for each node: 

VLOW the logic low threshold for the node (specified in normalized voltage units). 

vhigh the logic high threshold for the node (specified in normalized voltage units). 

tplh the low-to-high transition time for the node (specified in time quanta). 

TPHL the high-to-low transition time for the node (specified in time quanta). 
By adjusting the logic thresholds with VLOW and VHIGH, the user can prevent predictions of X values 
for circuits with non-standard pullup/pulldown ratios. This can be useful in a circuit where a node's 
voltage swing is reduced for performance or other reasons (for example, in input buffers or bit-lines of 
dynamic memory circuits). 

The transition time parameters force the timing of all the node's transitions. These parameters 
allow adjustment of the timing of critical nodes to agree with predictions of circuit analysis programs. 
Clocks, for example, often are generated by special circuitry designed to drive the a capacitive load. 
Intricate timing chains involving bootstrapping, etc. increase the speed of clock distribution circuitry to 
acceptable levels. Most of these circuit techniques are beyond RSlM's ability to predict accurately; 
incorrect predictions for critical signals can throw off the whole simulation. Using the transition time 
parameters, the designer can force the rise and fall times of critical signals to their proper values, 
improving the quality of the remainder of the simulation. 

It is obvious how transition time parameters affect the scheduling of events, but what about the 
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timing of a node connected directly to a forced node by a source/drain connection? A workable 
scenario treats a node with forced timings as an input, setting its dynamic resistance 
(Rvdd, Rgnd* and R x ) and capacitance parameters to zero. (Note that the value calculation, which uses 
static resistances, is unaffected.) The transition time for a node connected to a forced node is the sum 
of the given transition time for the forced node and the RC time constant of the path from the forced 
node. 
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(a) original circuit with forced node 



(b) equivalent network for node B 



Figure 4.20. How forced timings qffect neighboring nodes 

If a node is connected to more than one forced node, the smallest forced time constant is used. 
Neighbors of forced nodes always change value after the forced node — a reasonable prediction. 

A much more powerful mechanism for forcing the desired prediction is modification of the 
circuit itself, replacing troublesome configurations with others that simulate correctly. Piecemeal 
modification of a large circuit can quickly lead to a loss of confidence in the simulation results, 
especially if the replacements are performed in a haphazard manner. On the other hand, the 
systematic identification and replacement of specific subcircuits, drawing from a library of approved 
replacements, offers the opportunity to improve simulation accuracy for common subcircuits. 

The pattern matching/replacement program match, written by John Her [Iler83], provides an 
efficient way to systematically modify pieces of large circuits. The circuit to be modified is identified 
by a pattern specifying a prototype subcircuiL Each node in the prototype is given a type which 
controls what nodes it matches in the actual circuit: 

(1) matched only by a circuit node with exactly the same connections specified in the 
pattern. 

(2) matched by a circuit node with at least the connections specified in the pattern, 
but the circuit node may also have other connections. 



(3) matched by a node with the same name. 
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The pattern indicates which prototype nodes attach to each transistor in the prototype, and can further 
constrain the match by giving an explicit size or resistance for each prototype transistor. The 
replacement can modify parameters of existing circuit components, and add or delete components. 
For example, the following figure shows a pattern and replacement for the bootstrap circuit discussed 
in section 3.2. 
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(a) pattern (b) replacement 

Figure 4.21. Pattern/replacement for bootstrap circuit 

match is regularly used in at least one industrial environment to improve the predictions of 
RSIM. Her suggests other uses for the program: gathering of circuit statistics, identifying common 
circuit errors, and implementing circuit changes (ECO's) without requiring the regeneration of the 
entire nedist. match has proved to be a handy tool. 

4.4. An evaluation of RSIM 

rsim has simulated a large number of designs, both in university and industrial environments. 
Industrial designers are attracted to RSIM because of its ability to correcdy predict the functionality of 
most MOS circuits without designer intervention — a unique capability in a logic simulator efficient 
enough to accommodate large designs. RSiM's timing estimates are helpful in locating gross timing 
errors in industrial designs, but the conservative nature of the estimates make them unsatisfactory for 
fine tuning critical circuitry. In short, RSIM allows the verification of large industrial designs, at a level 
of detail not obtainable with other simulators. 

Timing estimates appear to be more important for academic users who, more often than not, 
have not paid as much attention to the performance of each individual circuit component RSIM makes 
a good breadboard for locating performance bottlenecks and experimenting with potential solutions. 
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Since transition timings automatically reflect output loadings and device sizes, the naive user's 
attention is focused on critical portions of the design. RS1M is a good companion for the novice 
designer because of its ability to qualitatively model much of the behavior of MOS circuitry. 

RSIM advances the state of the art of simulation in several ways. The linear model embodied by 
RSIM is a systematization of a common rule-of-thumb for estimating circuit performance. The 
simulator was originally developed simply to automate the calculation of RC time constants, and to 
reap the benefits of applying the same timing criteria uniformly to the entire circuit The analysis of 
propagation delay in Chapter 3 justifies the use of the linear model as a simple approximation and 
extends the rule-of-thumb to include the affects of the input waveform timings on gate propagation 
delay, rsim breaks new ground by combining logic-level simulation with the ability to automatically 
estimate transition times directly from the electrical properties of the circuit components. While the 
results are less accurate than circuit analysis, the designer is compensated by an increase in 
computation speed by several orders of magnitude, rsim represents a first cut at a stylized form of 
circuit analysis which attempts to model the significant effects at far less cost than traditional analysis 
techniques. The proven utility of RSIM augurs well for further developments in the area between logic 
simulation and circuit analysis. 

The introduction of intervals to characterize the operation of circuit components controlled by 
X-valued signals is a novel technique for merging electrical analysis with the logical concept of 
unknown signal values. The use of intervals allows one to easily compute the electrical consequences 
of unknown node values, resulting in predictions more satisfactory than those obtainable from 
conventional logic simulators or circuit analysis programs. 

There is, of course, plenty of room for improvement in rsim! For example, interconnect is not 
modeled at all. As a circuit's physical size decreases, the transmission delay introduced by the 
interconnect is as large as the propagation delay of the gates. Certain layout techniques, such as a 
long run of polysilicon, are inherently slow and might become the fatal flaw in an otherwise carefully 
tuned design. [Penfield81J offers some computationally reasonable models for predicting transmission 
delays; these models are well-suited for incorporation into RSIM. His analysis, along with that of 
[Horowitz83], offers some insight into the correct modeling of pass gates and distributed capacitances. 
(The lumped approximation used by rsim can be very pessimistic.) Along the same lines, the 
development of better time constants for charge-sharing events would improve the modeling of circuits 
containing both large and small capacitances. 
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Another class of problems is introduced by the one-pass nature of the computations. In order to 
limit the amount of computation needed for each prediction, the algorithms are constrained to make 
only one pass over the surrounding network. While most MOS circuits are trees, and hence amenable 
to a one-pass analysis, circuits that contain cycles are not handled correctly. The proposed solution — 
choosing a single path through the cycle to represent the cycle's resistance — is definitely ad hoc; 
performing the correct series/parallel analysis would be preferable. 

There is also a need to consider the effects of deviations in device performance from that 
predicted by first-order theory. Some effects (channel length modulation, body effect, short channel 
effects) might best be handled during the calibration process. Other effects (Miller capacitance) may 
lead to further modifications in the model or calculation of device parameters in order to ensure 
conservative predictions. Finally, there is the possibility that work on waveform bounding [Wyatt83], 
which seeks to obtain closed-form equations for the waveform of each node of a circuit, can provide a 
replacement for the linear model presented here. 
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CHAPTER FIVE 



Simulation Using a Switch Network Model 



If a designer is only interested in the logical properties of a circuit, ie., those properties 
independent of performance issues, it is possible to simplify the linear model of the previous chapter 
even further by modeling each transistor as an on/off switch whose state is determined by the type of 
transistor and the state of its gate node. This chapter discusses the switch model from two points of 
view: first, as a special case of the linear model, and then as a self-contained model. But first, a small 
digression on the representation of node values is in order. 

5.1. Representing node values 

The success or failure of a logic-level simulator often hinges on the choice of the set of possible 
node values. If the set is too small, the actual node value may not be precisely described by any one 
of the available values and the simulator must choose an approximation. Usually the approximation 
involves some variant of the X (unknown) value which may carry logical implications beyond what the 
network itself imposes — such a choice is termed either "conservative" or "pessimistic" depending on 
one's point of view. If the set is large, it becomes difficult to establish whether die simulator's 
calculations are correct in all cases. Relying on the accumulated evidence of many simulation runs 
when arguing correctness lacks the rigor that leads to total confidence in the algorithm. This section 
develops criteria for evaluating a set of node values. 
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Therc arc three major influences on the choice of the node-value set: 

(1) the need to report node values to the user; 

(2) the need to determine the state of each network component from the values of 
its terminal nodes; and 

(3) the need to represent intermediate values during an incremental simulation 
calculation. 

If only the first two influences are considered, a three-value set — 0, 1, and Xf — will suffice for 

logic-level simulation. Users and component models cannot reasonably expect more information than 

provided by this set, since most logic-level algorithms cannot support more detailed deductions from 

arbitrary MOS networks with any degree of accuracy. It is the third influence that leads to all the 

complication. 

Almost all logic simulators analyze a network piece by piece, modifying their estimates for node 
values as the effect of each piece of the network is determined. Until the new-value computation is 
completed, the intermediate node values serve as accumulators that store all the information the 
simulator has about the effects of network pieces already examined. Thus, distinct values are needed 
for all qualitatively different intermediate states; eg., a node currently at logic high might have that 
value because examination of the network to date revealed that it was (i) storing charge, (ii) connected 
to a depletion pullup, or (iii) being precharged by an enhancement device. The simulator must 
distinguish among these possibilities, since the final value of node may be different in each case if, for 
example, further network processing discovers a pulldown for the node. The exact number of values 
needed depends on the details of the simulation computation; most simulators fall into one of the two 
categories discussed below. As will be seen, the two categories are distinguished by their approach to 
X values. 



fit might be useful to distinguish X', an unknown, but legitimate logic value (e.g., the output of a pair of cross- 
coupled inverters) from other types of X values. X' values are well behaved in logic operations, for example. B + 
-<B = 1 if the value of B is X', but equals X if the value of B is X. Such distinctions might be important during ini- 
tialization. [Stevens83] describes a simulator that uses this distinction to improve its predictions for certain simple log- 
ic circuits. 
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5.1.1. Cross-product value sets 

One intuitively appealing approach to choosing a set of node values is to think of each value as 

having several distinct attributes chosen from independent categories. Thus, for example, one might 

characterize a node's logic state and the "strength" of the value separately. The logic state is usually 

one of 0, 1, or X; sometimes a high-impedance state, Z, is included to represent the output of tri-state 

logic gates [Flake80, Holt81]. The strength indicates what sort of network connection exists between 

the source of the value and the current node: 

input. Node is a designated input (e.g„ vdd or GND). The value of an input node can 
only be changed by explicit simulator commands — the assumption is that inputs 
supply enough current to be unaffected by connections (possibly shorts to other 
inputs) made by transistor switches. 

driven. Node is connected by closed switches to inputs or other driven nodes. Driven 
nodes can affect the value of weak or charged nodes without being affected 
themselves, but may be forced to an X state if shorted to an input or driven node that 
has a different logic level. 

weak. Node is connected to an input node by a depletion-mode transistor. Weak 
nodes can affect charged nodes without being affected themselves, but are forced to a 
driven state when connected to another driven or input node. A weak node returns 
to the appropriate weak state when completely disconnected from driven or input 
nodes (ie., a weak node can never enter the charged state). 

charged. Node is connected, if at all, only to other charged nodes. Until reconnected 
to some other part of the network, charged nodes maintain their current logic state 
indefinitely (charge storage with no decay). This is the default state of all non-weak 
nodes. 

Other strengths can be included to model the effects of differently sized transistors, node capacitors, 

etc. 

The plethora of 9-, 12-, and 16-state logic simulators (see [Newton80]) use values chosen from 
the set formed by the cross product of the various value attributes. For example, a 9-state simulator 
might use 
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Note that in this formulation, X is treated as sort of a third logic value on a par with and 1; 
presumably X's are generated by the simulator to model invalid combinations of O's and l's. The 
implication is that one can determine if a value should be X without any consideration of strengths. 
(Remember that the main motivation of forming the cross product is that the various attributes are 
independent). This can lead to pessimistic predictions, as is shown in an example below. 

It is useful to order the possible signal values according to their relative strengths. Intuitively, 
value A is stronger than value B, written A > B, if value A predominates when both signals are shorted 
together. Of course there are situations where neither value emerges unscathed — for example, when 
two signals of the same strength but opposite logic states are shorted — in which case neither signal is 
said to be stronger than the other. The notion of strength can be formalized using a lattice of node 
values, for example: 

DX 
^ \ 

DH DL 

\ ^ 

WX 
^ \ 

WH WL 

cx 
^ \ 

CH CL 

A 

Figure 5.1. Lattice of node values for a 9-state simulator 

The node value \ is used to represent the null signal, te, no signal at all. 

Referring to the lattice, given two values A and B, A > B if A is not equal to B and there is an 
upward path through the lattice that starts at B and reaches A. For example 
DX is greater than all other signals, 
DH is greater than WL, but 
WL is not greater than WH. 
The least upper bound (l.u.b.) of two values A and B, written A U B, is defined to be the value C 
such that 

(i) C > A 
(ii) C > B 
(iii) for every value D, if D > A and D > B, then D > C. 
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Examining the lattice above, it is easy to see that the l.u.b. always exists for any two node values. 
Note that if A > B, A U B = A; the l.u.b. captures our intuition about what should happen when two 
signals of different strengths are shorted together. With the appropriate placement of X values in the 
lattice, the Lu.b. can be used to predict the outcome when any two signals are shorted. 

The interpretation of X values captured by the lattice above is quite appropriate for describing 
the logic state of nodes involved in a short circuit: 




dx = dh u DL 



Figure 5.2. A short circuit leading to an X value 

Assuming the two transistors are the same size, the middle node's value is the result of merging two 
equal strength signal values. According to our lattice, this merger yields an X value. Short circuits are 
the mechanism by which X's are introduced into a network previously containing only O's and l's. 

However, the situation is not as straightforward when one considers connections formed by 
transistors with a gate signal of X. The resulting values cannot be computed directly using the U 
operation on the source and drain signals, and once that hurdle has been surmounted, there is some 
difficulty in choosing which value to use from the cross-product value set Consider the following 
analysis of a node with stored charge and connection to two transistors. 



CL 





I 

(a) (b) (c) 

Figure 53. Incremental analysis of a simple network 

Before any connections to the node have been discovered (figure 5.3(a)), the node maintains the 
charge of its last driven value, say, logic low; the simulator would assign the node a value of CL. 
After the first transistor is discovered (figure 5.3(b)), the facts change: 
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(i) Because of the X on the gate of the transistor, one cannot be certain what type 
of connection exists between the node in question and the DH on the other side 
of the transistor. Thus, the new logic state of the node should be X. 

(ii) The strength of the new value is uncertain, but clearly "weak" or "charged" 
would be inappropriate since they understate the strength in the case where the 
unknown gate value was actually a 1. 

Since a weak or charged value could be overridden by an enhancement pulldown discovered later on, 

mistakenly leading to DL value, the simulator has no choice but to select a driven value. The 

conclusion: DX is the only state available that handles all eventualities in a conservative fashion. Of 

course, with knowledge of what the rest of the network contains, the simulator could make a more 

intelligent choice, but this is beyond the ken of an incremental algorithm. 

By the time a connection to a depletion pullup is discovered (figure 5.3(c)), the die has been cast: 
the previously chosen DX value overrides any contribution by the pullup (DX U anything = DX). 
While this answer is not wrong, it is more conservative than required; at this point the logic state of 
the node should be 1. The pullup guarantees a logic 1 with the unknown connection to DH, only 
leaving doubts about the strength of the value (somewhere between weak and driven). 

Proponents of cross-product value sets might point out that the analysis would have generated a 
different answer if the transistors had been discovered in a different order. The somewhat 
embarrassing ability to produce two different answers for the same network, both correct, is caused by 
the fact that the merge operation is not associative when connections are made through transistors 
with X gates. In fact, most incremental simulators that use cross-product value sets perform the 
incremental analysis, in an order that yields a reasonable answer on the example above. Unfortunately, 
it is usually possible to confound them with more complex circuits containing X's; while such circuits 
are not commonplace, they often crop up during network initialization when all nodes start off at X.f 

In conclusion, it is possible to build effective simulators using cross-product value sets; however, 
they can make conservative predictions on circuits that contain X*s. In practice, this leads to difficulty 
in initializing some circuits and to occasional over-propagation of X values. 



t[Bryanl81J suggests using an incremental calculation only for subnetworks of nodes connected by non-X transistors. 
Once these values have been computed, a separate computation merges subnets connected by X transistors. Since 
this computation has global knowledge of the network, it can avoid the problems mentioned here. 
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5.1.2. Interval value sets 

The difficulties with the cross-product value set arise because of its separation of the notion of 
strength and logic state. Once a node value is set to an X value at some strength, it cannot return to a 
normal logic state unless overpowered by a stronger signal; if a node is set to the strongest X value, it 
stays at that value for the rest of the computation. As in the example above, this leads to conservative 
predictions when the strongest X value is chosen because of the lack of suitable alternatives. 
Specifically the difficulty came about because the simulator had to pick the highest strength to be on 
the safe side; there was no value available that would indicate that the logic low signal which 
contributed to the intermediate X value was of very low strength and hence might be overridden by 
later network components. 

This suggests a different approach to constructing the set of possible nodes values, one based on 
intervals. First one starts with a set of node values with a range of strengths and 0/1 logic states, for 
example, the six non-X states used above: {DH, DL, WH, WL, CH, CL}. Then additional values are 
introduced by forming intervals from two of the basic values; if there are six basic values, then there 

arc (p = 15 such intervals, leading to a total of 21 node values altogether. 

Intervals represent a range of possible values for a node. The size of the range is related to the 
strength of its end points. If we arrange the six basic values in a spectrum ranging from the strongest 
1 (DH) to the strongest (DL), the possible node values can be shown graphically: 



DH w 
logic high WH ° 

CH * 

CL 
logic low WL 

DL 



6 f f T 

A O O 

A f A 

A A A 

A A A A 



Figure 5.4. The 21 node values of the interval value set 



Intervals that do not cross the center line correspond to a valid logic state: intervals above the line 
represent logic high values, and those below the line, logic low. Intervals that cross the center line 
represent X values. (The X values of the previous section correspond to intervals with equal strength 
end points: DX = [DL,DH], WX = [WL,WH], and CX = [CL,CH].) Thus, X values result from 
ambiguity about which of the base values best represents the true node value. As will be seen below, 
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this is more satisfactory than thinking of X as a third, independent logic state. 

When the simulator merges two node values, it chooses the smallest interval that covers all the 
possible node states. However, unlike the cross-product value set, the interval set can represent X 
values without loosing track of the strengths of the signals that lead to the X values. Consider the 
problems raised by figure 5.3(b). Using an interval value set, the resulting node value is naturally 
represented by [CL.DH], an interval that corresponds to an X logic state. When the pullup is 
discovered (figure 5.3(c)), the simulator can narrow this interval to [WH,DH] since the pullup 
overpowers the weaker CL value. This corresponds to a logic high signal — a sensible answer. 

An algebra for calculating the result of merging two interval node values is developed in 
[Flake83]; a different approach is adopted in section 5.4.1 where a detailed description of the merge 
operation can be found. With an interval value set, the merge operation is commutative and 
associative, and the network can be processed in any order without affecting the final node values. 
The extra 12 values introduced by the interval value set are needed to carry sufficient information 
about how the current value was determined, to ensure that the final answer is independent of the 
processing order. 

The examples above suggest the following conjecture about the correct size of a node value set 

Is 
Assuming that one has s different signal strengths and two logic levels (0 and 1), then 2s + (j) values 

are needed to ensure that the signal algebra is well-formed. In simulators with too few states, some 
states take on multiple meanings; for example, the DX value in the cross-product value set is used to 
describe nodes that fall into 5 separate values in the interval value set: 

[DLJJH] [WL.DH] [CL,DH] [WH.DL] [CH,DLJ 
This lack of expressive power on the part of cross-product value sets is what leads to pessimistic 
predictions for node values in certain networks. 

5.2. Developing the switch model 

Switch models of mos circuits are of interest since a switch is the simplest component that meets 
the criteria outlined in Chapter 1: switches are inherently bidirectional and the logic operations they 
implement can be computed with acceptable efficiency in large networks. 

Randy Bryant [Bryant79], one of the first to apply switch-level simulation to MOS transistor 
networks, viewed the network as divided into equivalence classes. Two nodes are equivalent if they 
arc connected by a path of closed switches. Nodes in the same equivalence class as vdd arc assigned a 
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logic high state; those equivalent to GND, a logic low state. A pullup (a depletion-mode transistor 
which is always on in the switch model) gives the node to which it is attached a special property: if an 
equivalence class of nodes does not contain cither vdd or GND, but docs contain a pulled-up node, all 
the nodes in the class are assigned a logic high state. Finally, if an equivalence class contains neither 
an input nor a pulled-up node, it is "storing charge" and maintains whatever logic state it had last 

The simulator based on this switch model iteratively calculates the equivalence classes for all the 

nodes in the network until two successive calculations return the same result (Le., no nodes change 

state). Unfortunately this pure switch model has some deficiencies: 

(i) Switches in indeterminate states (those with "gate" nodes of X) make the 
equivalence calculation somewhat more difficult. The desired computation is 
inefficient since it involves a combinatorial search; all combinations of on/off 
assignments to switches in the X state need to be investigated to determine 
whether a switch's state makes a difference. If the network is unaffected by a 
switch's state, the switch can be ignored; otherwise all affected nodes are 
assigned the X state. 

(ii) The equivalence calculation is much more time consuming than necessary since it 
deals with the whole circuit rather than focusing only on the parts which change. 

(iii) In certain circuits transistor "size" is important, and the notion of size cannot be 
expressed in the pure switch model. A pullup is a trivial example: viewed as a 
switch it was always on, but more "weakly" than the "strong" switches in the 
pulldown. The size of transistors also determines the "strength" of various driver 
circuits; for example, it is common for the write amplifier of a static memory to 
force a value into a memory cell by simply overpowering the weaker gate in the 
cell itself. 

The remainder of this chapter investigates different approaches to solving the first two problems 

outlined above. The third problem is addressed with some success by rsim which uses size 

information not only to calculate node values but to provide timing information as well.f 

The following sections present two different formulations of the switch model: 

• a model where each node value is computed via a "global" examination of the 
network. If the network has no explicit feedback, each node value is computed 
exactly once, but this calculation is more expensive than the one below. 

• a model based on "local" interactions where the simulator examines the source and 
drain nodes of each transistor and updates the state of one or both nodes. The 
examination/update process continues until there are no further updates to be 
made, if., the network has "relaxed" into its final state. Under this scheme each 
calculation is trivial but a node value might be computed more than once even 



t Bryant [BryariUU] proposes extending the switch model to include a hierarchy of switch sizes, a generalization of the 
ad hoc solution for pullups. His thesis develops an algebra, in the spirit of Boolean algebra, for dealing formally with 
such networks. 
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when there is no explicit feedback in the circuit. 
esim (the author's switch-level simulator) is a hybrid of these two formulations, esim implements a 
global node-value calculation using a node-value representation close to the one used by the local 
simulator. This results in a calculation very similar to that implemented by RSIM, except that abstract 
"logical" resistances (R e jf = 0, 1, and oo) are substituted for the "real" resistances used in RSIM. 
Since this type of simulation algorithm is discussed at length in Chapter 4, it will not be pursued here. 
Instead, the remainder of this chapter focuses on the new formulations introduced above. 

The local formulation is attractive because it appeals to our intuition about how transistors really 
work. The high degree of potential parallelism in the update calculation makes it a very attractive 
algorithm for many of the new parallel architectures now under development; the combination of 
parallel hardware and intrinsically parallel algorithms may be the key to overcoming the capacity 
limitations of current simulation techniques. 

5.3. The global switch model 

The global simulator calculates a node's value by computing the effect of each input on the node 
of interest The simulation is global to that each node value is based directly on the values of the 
inputs to which it is connected. Thus, the values of non-input nodes do not enter into the 
computation. This means that 0, 1, and X will suffice as final node values; a node state need only 
capture the logic state of the node and no strength information is necessary. 

5.3.1. Node values in the global switch model 

Each transistor switch in the network is assigned a state determined from the transistor's type 
and the current value of its gate node. This state models the switch-like qualities of the source-drain 
connection without trying to capture any more detailed information about the connection — a 
simplification of the linear model presented in earlier chapters. 

The state of a transistor switch summarizes the type of connection that exists between its source 

and drain nodes. For MOS circuits, the possible switch states are: 

open no connection, the state of a non-conducting n-channcl (gate = 0) or p- 

channel (gate = 1) transistor. 

closed source and drain shorted, the state of a conducting n-channel (gate = 1) 
or p-channcl (gate = 0) transistor. 

unknown uncertain connection between source and drain, the state of an n- or p- 
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channel transistor whose gate is X. 

weak the state of a depletion transistor. Depletion devices are always assigned 

this state, regardless of the state of their gate nodes. 

The relationship between a switch's state, its types, and its gate value is summarized in the following 

figure. 
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Figure 55. Switch state as a function of transistor type and gale voltage 



In the global simulator, the value of a node is determined by the inputs to which it is connected 
and the states of the intervening switches. During the calculation of a node's value, the simulator uses 
the interval node-value set presented in figure 5.4. When the calculation is complete, the resulting 
interval is used to determine the final logic state of the node, using the following table. 
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Table 5.1. Relationship between final logic state and computed interval value 

The calculation of a node's value begins by discovering all the inputs which can be reached from the 
node by paths of closed, weak, and unknown switches. If no inputs can be reached, the final logic 
state of the node is determined by a charge sharing calculation described in the next section. If one or 
more inputs can be reached, their contribution to the node's value is determined by an incremental 
calculation which starts at the inputs and works its way back toward the node. 

The value of a logic low input is DL; the value of a logic high input is DH. As the calculation 
works back toward the node of interest, it computes an effective value that indicates the effects of 
intervening switches on the original input value. The effect of a switch on a value it transmits is 



specified by the switch function: 



96 



input 



1 X 

value = switch(ff,, input value) 
Figure 5.6. Effective value of an input after passing through a switch 

The effect of a switch on a value is a function of the value and the switch's state: 
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Table 5.2. swilchfa, value) as a junction of a and value 

A new value, X, is introduced to describe the value transmitted by an open (non-conducting) switch, 
ie, no value at all. The value X is weaker than CH or CL, and corresponds to a logic state of X. 

When two paths merge, their effective value is determined using the U operation introduced in 
section 5.1.1. 
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Figure 5.7. Merging the values for two paths which join 



The U operation is defined using the lattice shown in the following figure. 



[DHJ)L] 

DH DL 

I I 

[DH.WL] [WH.DL] 

[DH.WH] [WH.WL] [WL.DL] 

/ \/ \/ \ 

[DH.CL] WH WL [CH.DL] 



{DH.CH] [WH.CL] fCH.WL] [CL.DL] 

/\ /\ /\ /\ 

PH.X] [WH.CH] [CH.CL] [CL.WL] fX.m.] 

\/ \/ \/ \/ 

[WH,X] CH CL [X.WL1 



[CH.X] [X.CL] 



Figure 5.8. Lattice for interval-node value set 

Following the procedure outlined in figure 5.7, the contributions of all inputs connected to the node of 
interest can be reduced to a single interval. This interval is merged (using U) with the contribution 
from the node's current logic state 



contribution of current logic state = 



CL if current logic state = 

CH if current logic state = 1 

[CH,CL] if current logic state = X 



to give the final interval characterizing the node's new logic state. 

As an example of how the new-value calculation works, consider the following circuit: 



(5.1) 
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Figurc 5.9. Example circuit 

Assume that the current logic state of the output is 0. The new-value calculation for this circuit is 
shown in the following figure. 



WH 




dosed 




(a) 



(b) 



(c) 



Figure 5.10. New-value calculation for circuit in figure 5.9 



The final interval for the output node is CL U [\,DL] = [CL,DL] which corresponds to a logic low 
state. This makes sense; the previous state of the output node was logic low, so the uncertain 
connection to the inverter does not affect its logic state, just the strength with which its driven. Note 
that it is important to merge the values of paths that join before continuing with the calculation since 



switch(a, a U /?) * switchia, a) U switch(o, /3) 



(5.2) 



when using this particular value set and switch function. For example, if the WH and DL values had 
been merged after transmission by the switch in the unknown state, the final interval for the output 
node would have been [DH,WL], which corresponds to an X logic state. The calculation described 
here performs all possible merges before transmitting the result through the appropriate switch. 
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5.3.2. The global simulation algorithm 

This section outlines the basic steps for propagating new information about the inputs to the rest 
of the network, recalculating node values (where necessary) using the global value calculation in the 
previous section. 

When a node changes value, it can affect the network in one of two ways: 

(i) directly, through source/drain connections of conducting transistors. 

(ii) indirectly, by affecting the state of transistor switches controlled by the changing 
node. This is turn can cause the source and drain nodes of those switches to 
change value. 

The global simulator accounts for these two effects using to different mechanisms. Directly affected 

nodes are handled implicitly by the new-value computation which recomputes new values for all 

directly affected nodes whenever a node changes value. This is a reasonable organization: if A directly 

affects B, then B directly affects A; it makes sense to compute both values at the same time since they 

are closely related. Direct effects are not handled implicitly, however, when the user changes the 

value of an input node. In this case, the simulator invokes the new-value computation on the input, 

not to recompute the input's value (which is set by the user), but to recompute the values of all 

directly affected nodes. 

The indirect effects of a value change are managed by an event list that identifies all transistor 
switches that have changed state. Actually, the event list keeps track of the nodes that have changed, 
but this is equivalent since the network data base maintains a list of transistors controlled by each 
node. The simulator operates by removing the first node from the event list, and then performing a 
new-value computation for the sources and drains of all transistors controlled by that node. The new- 
value computation accounts for all the direct effects of the new transistor state and adds events to the 
event list if indirect effects are present This process continues until the event list is empty, at which 
point the network has "settled" and the simulator waits for further input 
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whilc event list not empty { 

n : = node associated with first event on event list 
remove first event from event list 
for each transistor with n as gate node { 
set COMPUTE flag for source and drain 

} 

for each transistor with n as gate node { 

if compute still set for source, compute new value for source [fig. 5.14} 
if compute still set for drain, compute new value for drain 

} 



} 



Figure 5.11. Main loop of global simulation algorithm 



Finding nodes affected by an event is straightforward; recomputation of values is needed for the 
sources and drains of all transistors with the changing node as gate. For example, if the node marked 
(*) in the following figure changes, nodes B and C need recomputation. 
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Figure 5.12. Event for node (*) involves nodes B and C 



Of course, node D also needs to be recomputed, as will be discovered during the processing of B and 
C (see below). 

To recompute the value of a given node, the simulator first makes a connection list containing all 
nodes connected to the first node by a path of conducting transistors. The idea is to start with a node 
known to be affected by an event, and then find that node's electrical neighbors, and so on, halting 
whenever an input is reached. In the example above, if the (*) node's value is 1, the connection list 
for node B contains nodes B, C, and D. If the (*) node's value is 0, the connection list for node B 
contains only node B. Node A is not included in the list in either case because it is not connected to 
node B by a path of conducting transistors. In the code below, which computes the connection list for 
a given node, the terms "source" and "drain" are used to distinguish one terminal node of a transistor 
from the other, and do not imply anything about the terminals' relative potential. 
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initialize list to have starting node as only clement 

set pointer to beginning of list 

input found : = false 

reset capacitance accumulators 

while pointer not at end of list { 
n : = node currently pointed at 
add capacitance of n to appropriate accumulator 
for each "on" transistor with source connected to n { 
if drain is an input, input, FOUND : = true 
else if drain not on list, add drain to end of list 

} 

advance pointer to next list element 

} 

Figure 5.13. Non-recursive routine to build connection list 

In addition to the connection list, the routine sets INPUT_F0UND to true if the tree walk discovered at 
least one input, and maintains three capacitance accumulators, one for each logic state. The 
connection list drives the new-value computation: 

make connection list starting with given node [fig. S.13] 
if no inputs found, do charge sharing 
else for each node on connection list { 

compute interval value for node [fig 5.15} 
determine new logic state using Table 5.1 
if different from old logic state { 
update logic state to new value 
enqueue new event 
. } 
} 
reset compute flag for each node on connection list 

Figure 5.14. Subroutine to compute new value for node 

If no inputs are found while building the connection list (input. FOUND is false), the group of nodes is 
completely isolated from any inputs and a charge sharing computation determines the nodes' new 
values. Assuming that all the node capacitors are shorted together, the resulting voltage is 

.... ^capacitors at logic high 

voltage of snorted capacitors = xr-r, (5-3) 

jjm capacitors 

Capacitors with a logic state of X are assumed to be charged high when computing the maximum 
possible voltage, and charged low when computing the minimum voltage: 



charge sharing value 



C high + C X < Q 2 

C total 
Chigh 



Ctotal 
otherwise 



> 0.8 (5.4) 
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where C tola i is the sum of the capacitance accumulators, Chigh is the accumulator corresponding to 
logic high, and Cx is the accumulator corresponding to logic X. 

If one or more inputs are found (input found is true), the value of each node is determined in 
accordance with the procedure described in the previous section. The interval value is calculated for 
each node in turn and the node's new logic state is computed using Table 5.1. New events arc added 
to the end of the event list whenever a node changes value. If a changing node is already on the 
event list, nothing happens (the node is not moved to the end of the list). 

For efficiency, each affected node's value is only computed once while processing a given event 
The connection list ensures that all affected nodes are recomputed; the compute flag ensures that 
once a node has appeared on some connection list, it will not be resubmitted for processing during the 
current event 

The computation of a node's value is easily described by a recursive procedure which analyzes 
the surrounding network: 

if node is logic low input { 

return DL 
} else if node is logic high input { 

return DH 
} else { 

local, iv : = value specified by equation 5.1 
set visited flag for current node 

for each "on" transistor, t with source connected to current node { 
if drain does not have VISITED flag set { 

recursively determine interval value for drain node 
local IV := local iv U switchia t , drain's interval value) 
} 
} 

reset VISITED flag for current node 
return local rv 
} 

Figure 5.15, Subroutine to compute interval value for node 

The variable local. iv is a stack-allocated local variable of the subroutine. Returning to the example 
in figure 5.12, assuming that the (*) node's value is 1, and that the old values for B, C, and D are 
B = 1, C =0, and D =0, the following calls are made when computing the new value for node C: 
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computc_params(C) 

LOCAL, IV = CL 

compute_params(D) 

LOCAL, IV = CL 

compute_params(VDD) 

return DH 
LOCALJV = CL U WH = WH 

compute_params(GND) 
return DL 

LOCALJV = WH U DL = DL 

return DL 

LOCALJV = CL U [X,DL] = [CL.DL] 

compute_params(B) 

LOCALJV = CH 

return CH 

LOCALJV = [CL.DL] U CH = [CH.DL] 

return [CH.DL] 
Figure 5.16. Trace of interval value computation for example in figure 5. 12 



Marking each visited node (by setting its visited flag) avoids cycles; this keeps the tree walk 
expanding outward from the starting node. The visited flags are reset as the routine backs out of the 
tree walk, so all possible paths through the network are eventually analyzed. 





(a) original circuit 



(b) circuit as seen by tree walk 



Figure 5.17. The tree walk traces out all possible paths 



If the network contains cycles, the tree walk might lead to more computation than a series/parallel 
analysis; this is a problem for circuits containing many potential cycles (such as barrel shifters), 
especially during initialization when many of the paths are conducting because control nodes are X. 
To speed up the calculation, a node's VISITED flag can be left set, restricting the search to a single path 
through a cyclic network. This technique produces correct results only if paths leading away from a 
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node are explored in order of increasing resistance, Le., one must ensure that the first time a node is 
reached, it is by the path of least resistance. Of course, the flags must be reset once the entire 
computation is complete; fortunately, the connection list provides a handy way of finding all the nodes 
that are visited without resorting to yet another tree walk. Another alternative for speeding up the 
calculation is the caching technique described in section 4.2. 

5.3.3. Interesting properties of the global algorithm 

The event list serves to focus the attention of the global simulator; new values are computed 
only for nodes which appear on the event list or which are electrically connected to event-list nodes. 
Portions of the network that are quiescent are not examined by the simulator. Algorithms that have 
this property are said to be selective- trace or event-driven algorithms and generally run much faster 
than algorithms which are not event driven [Szygenda75].f 

An interesting implication of selective trace is that special care must be taken to ensure that 
"constant" nodes, such as the output of an inverter with its input tied to GND, are processed at least 
once (otherwise they will have the wrong values). One technique is to treat VDD and GND as ordinary 
inputs when first starting a simulation run — sort of a power-up sequence as VDD and GND change 
from X to 1 and respectively. Computing both the direct and indirect consequences of changes in 
VDD and GND might involve a tremendous amount of computation since the whole circuit is affected; 
often only computing the indirect consequences is a sufficient and less costly alternative. 

Although there is no explicit mention of time in the global simulator, the first-in, first-out (FIFO) 
processing of events imposes some ordering on the changes of node values. This ordering is similar to, 
but not the same as, the unit-delay ordering used by many gate-level simulators. In an event-driven 
unit-delay algorithm, the output of each gate that had an input change is recomputed using the current 
values of the input nodes. The new output values are saved and imposed on the network only after 
processing all gates. The net effect is that each computation cycle (representing a unit of time) 
propagates information through one level of gate, Le., each gate has unit delay. Because changes in 
node values are imposed all at once, values change simultaneously, which can lead to problems in 



fExccptions to this rule are some hardware-based simulation algorithms, such as programs run on the Yorktown 
Simulation Engine [Pfister82]. Tlie builders of the YSE point out that simulations might well run slower because the 
extra communication and branching needed to implement selective trace would compromise the parallelism and pipe- 
lining used to great advantage in the YSE. However, if sufficiently large portions of the circuits could be ignored, 
the overhead of selective trace could be worth the investment (see Chapter 6). 
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circuits containing feedback paths. 

The global simulator implements a pscudo unit-delay algorithm. New events are added to the 
end of the event list, so the oldest changes are processed before any consequences of those changes 
are processed. Thus, FIFO event management leads to the same sequence of gate evaluations' as a 
unit-delay algorithm. However, because the global algorithm changes values in the network 
incrementally rather than all at once, it is possible to find circuits that behave differently under the two 
simulators: 



0-1 




* 1— 0— 1— ... 



0-1 



4—0 !_♦()- 1— ... 




O 1-0 



(a) unit delay (b) pseudo unit-delay 

Figure 5.18. Circuit that distinguishes unit-delay from pseudo unit-delay 

A 0-1 transition on the input causes a unit-delay algorithm to loop forever. The global algorithm 
predicts only one transition — the output of whichever gate it processes first Neither answer is 
completely correct; the actual circuit enters a meta-stable state on a 0-1 input transition, eventually 
settling to a particular configuration determined by subtle differences in the gains of the two gates. It 
will not remain in the meta-stable state forever, so an infinite oscillation is a poor prediction. On the 
other hand, the final configuration chosen by the global simulator depends on the order of some list in 
the network data base. The predicted outcome is the same each time, not necessarily the best 
prediction, t The global simulator does not offer a general solution to the oscillation problem; both 
simulators will oscillate on the following circuit 



t[Bryant81] suggests that the oscillation can be detected and the offending node values replaced by X, but the tech- 
nique for determining the number of oscillations to allow yields answers so targe for circuits of any substantial size 
that this is not a very practical alternative. 
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Figure 5.19. Circuit which causes both simulators to oscillate 

Along the same lines, the global simulator predicts that the output of the circuit below will 
oscillate when the input changes from 1 to 0. 
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node which is both an input and output 



Figure 5.20. Circuit with a node that is both an input and output 

The actual output quickly rises to the balance point of the pullup/pulldown combination. In a logic- 
level simulation, this corresponds to finding a solution to the equation a = ~*a which has the solution 
a = X (a reasonable logic-level representation for the balance point). This example is drawn from a 
larger class of circuits where a node is both an input and output of the circuit. Since the new-value 
computation uses current transistor states (determined by current node values) to predict the new 
values, it is impossible to predict the value of a node that depends on its own value. This limitation 
has not proven to be a problem in practical circuits. 



5.4. The local switch model 

It is interesting to speculate about replacing the tree walk performed by the global simulator with 
a strictly local computation. After all, the models of transistor behavior presented in Chapter 3 show 
that a transistor is controlled by the voltages of its three terminal nodes, ie, each transistor operates 
independently, basing its behavior on only local information available at its terminals. The simulation 
model described in this section works in much the same way. The basic operation involves updating 
the terminal node values of a transistor switch using only information about their previous values and 
the state of the switch. 
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Relaxation-based algorithms leave one a little nervous. Will the relaxation terminate? Does the 
final answer depend on the order in which the individual computations are performed? These 
questions are answered below, after a description of the algorithm itself. 

5.4.1. Node values in the local switch model 

The set of node values and the computation developed for the global simulator must be adapted 
for use by the local simulator. The necessity for an adaptation is explained at the end of section 5.4.2. 
(The discussion is postponed until after the local simulation algorithm has been presented, when it will 
be easier to explain why the global simulator's techniques do not work in the local simulator's context) 

In the local simulator, a node value is a pair 

<high,low> 

that separately lists what type of connection exists to each of the two possible input signals. The high 

component summarizes what is known about paths to vdd, and the low component describes paths to 

GND. Ignoring for the moment switches with gates of X, four types of connections can be 

distinguished for each component: 

co no paths to inputs, no charge storage. 

S charge storage. 

1 there is a path to the appropriate input, but it passes through one or more 
depletion switches. 

there is a path of conducting n-channel (gate = 1) and p-channel (gate = 0) 
switches to the given input 

A switch with a gate of X may or may not make a connection; the resulting path is characterized by an 

interval describing the range of alternatives, u) = 6 intervals are needed to describe all possible 

combinations of paths. 

The value of vdd is <0,oo> and of GND is <co,0>; some other examples are shown in the 
following figure. 
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(a) <1, 0> 



(b) <{0,1], S> 



(c) <S, S> 



Figure 5.21. Examples of node values in the local simulator 
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CH 



This organization provides for many more values than actually needed by the simulator; many of the 
values make distinctions that are not important in determining a node's logic state. For example, <1,0> 
and <S,0> both represent values corresponding to pulled-down nodes — it does not matter what the 
high component contributes if it is weaker than the low component. The advantage of this notation is 
the ease of computing what a given signal looks like from the other side of a transistor switch: 






<i,o> — ' u -* ? <i.o> 
(b) <[l,oo], [0,«]> (c) <1,1> 



-T~U, 



<i,o> 

(d) <oo,oo> 



Figure 5.22. <J,0> value as seen across various transistor switches 

This will prove very useful in describing the update operation below. 

Using the technology developed in section 5.1.1, a lattice can be constructed that indicates the 
relative ordering of the various component values: 
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Figure 5.23. Lattice for the ten possible component values 



The U operation can be used to calculate the result of considering two paths in parallel: 



<Ai, /i> U <h% h> = <hi U h 2 , h U h> 



(5.5) 



lerged separately according to the lattice given above. Similarly, two values can 



Each component is m 

be ordered by comparing their components: 

<h h li> < <h 2 , h> iff hi < hi and h < h 
A logic state can be associated with a value <h,l> using the following table: 



(5.6) 
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Table 53. Logic state associated with <h,l> 
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5.4.2. The local simulation algorithm 

The local simulator implements a relaxation-based calculation for propagating input values 

through the network. The calculation has three major steps: 

Step 1. Determine the state of each transistor switch from its type and the logic 
state of its gate node. If no switches are found that changed state since 
the last examination, the network is said to have "settled" and the 
simulator waits for more input 

Step 2. Reset each non-input node value to its charged value, a value that 
corresponds to the node's last logic state but does not have sufficient 
strength to force the value of any neighboring nodes. 

Step 3. Repeatedly pick a transistor and update the values of its source and drain 
nodes according to the formula given below, continuing until the 
relaxation is complete (no node changes value as die result of an update). 
Upon completion, return to Step 1. 

Each of these steps is described in more detail below. 

Figure 5.5 shows how a switch's state is determined from its type and the logic state of its gate 
node. Once determined, the switch state remains stable through Steps 2 and 3 even if the gate 
changes value. This arrangement is necessary for the correct operation of the simulator since a node's 
value might temporarily be incorrect during the relaxation computation while information continues to 
propagate towards the node from various inputs. For example, the output of a nand gate may 
momentarily appear to be pulled-up, because the near-by pullup affects the node's value before 
information can propagate from GND up the pulldown chain. Since there are no guarantees about the 
ordering of updates, a node's value is known to be correct only when the relaxation process 
terminates. 

Step 2 makes sure that the relaxation starts off with a clean slate; when this step is complete, 
only input nodes have values that can cause the values of neighboring nodes to change. This ensures 
that values for non-input nodes are determined exclusively by the values of the input nodes. 





<OQ,S> 


current logic state = 


charged value — 


<S,oo> 


current logic state = 1 




<S,S> 


current logic state = X 



(5.7) 



If a node is not connected to any input, the charged value is an accurate representation of its final 
value. The update calculation performs a rudimentary charge sharing computation; a charged node 
can become connected to another charged node with the same logic state, and still maintain its value. 
Connection to a charged node with a different logic state results in both node values becoming <S,S>. 
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Note that prccharge/discharge circuits arc simulated correctly. 

An update operation involves the source and drain nodes of a single transistor switch. ITie new 
values of the source and drain arc calculated from their old values and the state (a) of the switch: 



Vsource = Vsource U Swilch(o, Vdrain) 
"drain = Vdmin U switch(a, Vsource) 



(5.8) 



The function switch(a, value) formalizes our intuition about the effect on a value as it passes through a 
switch in a given state (see figure 522). The new value of a terminal node is the result of merging its 
old value with the old value of the other terminal node after it has passed through the switch. 



switch(o, <h, />) = 



CO 

<A,/> 

<h + [O.ooj, / -(- [0,o0]> 

<h + [1,1], / + [1,1]> 



a - open 
a = closed 
a = unknown 
a — weak 



(5.9) 



where "+" is the series operation described in the following table: 
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Table 5.4. Series operation for local simulator 

In general, the local algorithm's predictions are more pessimistic than those of the global 
simulator. The following Jigure illustrates the analysis performed by the local simulator for the circuit 
shown in figure 5.9. (The global simulator's analysis is shown in figure 5.10) 
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(a) original configuration 



(b) after network settles 



Figure 5.24. Local simulator analysis for circuit in figure 5.9 

As shown in figure 5.24(b), the local simulator predicts the logic state of the output node to be X — a 
pessimistic answer. (The global simulator predicts a logic state of 0.) On the other hand, the local 
simulator cannot simply adopt the value set and computation of the global simulator. The reason why 
is illustrated by the following figure. 




#3 ' — • [CL.DL] 



#2 
(a) original configuration 




#3 > »[WH45Lj 



(b) update order: #1, #2, ... 



(c) update order: #1, #3, ... 



Figure 5.25. Global simulator's computation using update operations 

The figure shows the final node values {Le., the values after the network has settled, and further 
updates make no change to the network), assuming that the first few updates were performed in 
different orders. Figure 5.25(b) shows the final node values if switch #1 is updated first, followed by 
switch #2. Figure 5.25(c) shows the final node values if switch #1 is updated first, followed by 
switch #3. As one can see, the value of the output node differs in the two examples. 

If the local simulator's predictions of the final node values are to be independent of update 
order, it must be the case that 



switchia, a U /?) = switch(a, a) U switch{c, fi) 



(5.10) 



In other words, it cannot matter if early estimates of a node's value (a) are transmitted to neighboring 
nodes before additional information (fi) arrives. Unfortunately, equation 5.10 is in direct conflict with 
equation 5.2 which indicates that order makes a difference in the analysis of certain circuits (such as 
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the one in figure 5.9) when using the global simulator's value set Thus, the local simulator cannot 
simply adopt the global simulator's value set 

5.4.3. Interesting properties of the local algorithm 

In order to answer the questions raised when first introducing the local algorithm, some 
definitions will be useful. Let S be the set of switch-state vectors 0102 •• • <r t where t is the number 
of transistor switches in the network. Similarly, let V be the set of node-value vectors viv2 • ■ • v„ 
where n is the number of nodes in the network. Then SX V is the set of possible network states. 

Definition. Let X and Y be network states. X > Y if Sx = Sy and Vx > Vy 
where comparison between vectors is done component by component 

The update operation changes one network state to another; one writes X-*Y if a sequence of zero 
or more updates changes the network state X into the network state Y. X -* m Y means that m or 
fewer updates will change X into Y. 

The update operation can potentially change two elements of the node-value vector; the switch- 
state vector is never affected by an update. Not every update causes the network state to change. For 
example, if the update chooses an open switch, the resulting network state will be the same as the 
original state. In the presentation below, it is useful to distinguish those updates that result in a 
change in the network state from those that do not: 

Definition. Let X and Y be network states. X =*• Y if X -*i Y and X * Y. 

In fact, X => Y implies Y > X, a simple consequence of equation 5.9 and the definition of U. A 
stable network state is one which does not change as the result of any update: 

Definition. Let X be a network state. X is stable if, for any network state Y, X -*■ Y 
implies X = Y. 

It follows directly from this definition that a state is stable if and only if no => operations are possible 
on the state. Once a stable state is reached, the relaxation process can safely be terminated since 
further updates will not change the network state. This suggests the following metric for measuring 
how far the relaxation process has to go: 

Definition. Let A' be a network state, order (X) is defined to be the largest integer m 
such that there exist states Y\ Y m where X => Y\=> • • • => Y m . 
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The termination of the relaxation process is assured by the following theorem: 

Theorem 5.1. For any network state A', order (X) is finite. 

The proof is based on the observation that there are only finitely many network nodes and possible 
node values. This means for any given network state X, there are finitely many states Y such that 

Y > X. Since each =*• operation produces a state strictly greater than its predecessor, one can 
perform the ^ operation only finitely many times before all the possible states are exhausted. I 

For a given starting network state, Theorem 5.1 tells us that a stable state can be reached with 
only a finite number of =* operations. In fact, one can prove that there exists a unique stable state 
for any network state, but first we must lay a little more groundwork. 

Lemma 5.2. Let W and X be network states. If order(W) = m and W => X, then 
order (X) < m. 

Suppose that order(X) > m, then there exists a sequence of => operations 
W => X => Yi => • • • => Y 0T der(xy This implies order (W) > m +1, a contradiction. I 

Lemma 53. (Church-Rosser property) Let W, X, and Y be network states. If 
W -»i X and W ■-*\ Y, then there exists a network state Z such that X -* Z and 
Y-+Z. 

Appendix 1 presents a proof based on a case by case analysis of the possible choices for X and Y, 
demonstrating for each case a sequence of updates that lead to a common state Z. 

This sets the stage for proving the uniqueness of the stable state. For readers acquainted with 
the lambda calculus, the following theorem has a familiar ring. There are many similarities between 
the update operation and X-conversion; the discussion of normal forms and the Church-Rosser 
theorem found in [Curry74] inspired the concept of stable states and the existence and uniqueness 
theorems presented here. 

Theorem 5.4. Let W, X, and Y be network states. If W -* X and W -* Y, then 
there exists a network state Z such that X -* Z and Y -*■ Z. 

The proof proceeds by induction on the order of W. If order (W) = 0, then W is stable and so 
W = X - Y = Z . Without loss of generality, if order (W) > 0, one can assume X > W and 

Y > W since if this were not the case, the result follows trivially. If order (W) = 1, the result follows 
as a direct consequence of Lemma 5 J. To show for order(W) = n + 1, first note that there exist 
states A and B such that W => A -* X and W => B -* Y. Then, by Lemma 5.3, there also exists a 
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state C such that A -*■ C and B ■-*■ C. 

order = n+1 



/ \ / \. order ^ ft 

z 
Figure 5.26. Relationship between states in proof for Theorem 5.4 

Using Lemma 5.2, note that the orders of A, B, and C are all less than n+1. Thus, by the induction 
hypothesis, there exists a state D such that X -* D and C -* D. Similarly, there exists a state E such 
that Y -* E and C -* E, also by the induction hypothesis. Finally, by a third appeal to the 
induction hypothesis, there exists a state Z such that D -* Z and E -* 2. I 
Taken together, Theorems 5.1 and 5.4 imply the following corollary: 

Corollary 5.5. Let JIT be a network state. There exists a unique network state Y such 
that Y is stable and X =»•••=» Y. 

Thus, the relaxation process terminates for any starting network configuration, yielding the same stable 
state regardless of the order chosen for performing the updates. 

One of the attractions of the local algorithm is the opportunity it affords for parallel processing, 
especially during the relaxation process. Allowing parallel updates introduces the problem of merging 
conflicting node values at the end of the updates. The simplest solution is to allow updates to happen 
simultaneously only if they operate on separate portions of the network state. With this restriction, 
each node is involved in at most one update operation, and the potential for conflict is avoided. If the 
number of available processors is a lot smaller than the number of nodes in the network, there is only 
a small probability of a processor lying idle, because there are an insufficient number of allowable 
updates. 

Parallel implementations that avoid conflicting updates are covered by the existence and 
uniqueness results obtained above, since it is easy to convert the set of updates performed at any time 
step into an equivalent sequence of sequential updates. This approach has sufficient parallelism to 
keep many current parallel architectures quite busy. However, there are architectures on the drawing 
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boards with very large numbers of processors; it is interesting to speculate about algorithms that can 
usefully employ as many processors as, say, there are transistors in the network. 

To explore the possibilities, imagine a multi-processor constructed of the following elements: 



source "■ — /\ ^ drain ^v/\ v*"^ 

[NT 




T 



Y 



gate C 

(a) transistor element (b) node element 

Figure 5.27. Simulator processing elements 

Both types of elements synchronize their operation to a four-phase global clock: 

Phase 1. The transistor element samples the values of its source and drain 
connections and calculates new values using internal information about its 
type and current state. 

Phase 2. The newly updated values are driven on to the source and drain 
connections by the transistor elements. 

Phase 3. Each node element samples one of its three connections and computes 
the least upper bound of the sampled value and its stored state. The 
connections can be sampled in any convenient order; the only 
requirement is that a connection not be ignored indefinitely. 

Phase 4. The node elements drive their connections with the value computed 
during Phase 3. 

Note that the node element is particularly capricious; it ignores two of its three connections in any 
given cycle. This complicates the notion of an update since there is no guarantee that the two node 
elements attached to the source and drain connections of a transistor element will be listening when 
the results of an update are made available. It becomes especially confusing when one of the elements 
is listening and one is not, which results in "half' an update. Of course, one can conceive of less 
bizarre node elements, but if it is possible to prove correct operations under the proposed conditions, a 
much wider class of parallel architectures will be appropriate for the local algorithm. 

The elements are wired together in a way that mirrors the topology of the network to be 
simulated; multiple node elements are used to model network nodes with a large number of 
connections. 



117- 



VDD 





(a) circuit schematic 



A 


Z"A (n) 


(n) 


© B 


GND 


A 




(b) element interconnect 



Figure 5.28. Example wiring diagram for simulator elements 



By providing one processor per transistor and node, this implementation exhibits all the parallelism 
one could reasonably expect Steps 1 and 2 of the local algorithm are accomplished in a single clock 
cycle. During Step 3, an update calculation for each transistor is performed every clock cycle. A 
wired-or'ed signal visiting all the node elements can detect when the relaxation process is complete; a 
similar signal connected to all transistor elements can indicate when the network has settled. 

This scheme is not as fanciful as it seems — the Connection Machine project [Hillis81] now 
underway at the M.I.T. Artificial Intelligence Laboratory has an architecture well suited to an 
implementation similar to the one described above. Fully configured, its one million elements would 
be able to simulate sizeable circuits at very high speeds. However, me real purpose in proposing this 
architecture is to provide a vehicle for analyzing the operation of the local algorithm in a parallel 
environment 

A key insight into the design of a parallel engine is that the value stored by each node element 
must be non-decreasing with time, Le., if v; , ..., V{ t are the values of node element i at successive clock 

cycles, then v/ < • • • < vy . The "ratcheting" of node values up the lattice, which was crucial in 

showing termination of the relaxation in a sequential implementation, must be preserved in the parallel 
implementation. With this in mind, consider adding a communications link between two node 
elements: 
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Figure 5.29. Simulation engine incorporating communication link 

Since the system must already accommodate the unpredictable behavior of node elements, the 
demands on the link are minimal; messages cannot be garbled and the network cannot become 
partitioned indefinitely. However, messages can be dropped or delivered in any order since these 
failures do not affect the monotonicity of a node's value. 

Two important questions remain to be answered about parallel implementations that allow 
conflicting updates: 

(1) Is there an analog for Lemma 5.3? 

(2) Does this parallel implementation give the same answer as the sequential 
implementation? 

The author's speculation is that both questions can be answered affirmatively. This belief is based on 

the observations that no information is lost that cannot be recalculated, and the operation of the 

switches and merging of results remains unchanged. Given that the order in which the propagation 

happens was shown to be irrelevant by Theorem 5.4, it seems unlikely that the slightly more baroque 

propagation mechanism of a parallel implementation would seriously change the picture. 
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CHAPTER SIX 



Simulation Using a Pre-compiled Network Model 



The simulation algorithms presented in previous chapters rely on examination of the surrounding 
network to determine the value of a -given node. The surrounding network is re-examined every time 
the node's value needs recalculation. This chapter investigates breaking this process into two steps: a 
single complete network analysis which builds a set of four logic equations for each node, indicating 
the types of connections between the node and vdd or gnd; and simulation, where the value of each 
node is determined by evaluating its equations built during the first step. Not only is the overhead of 
a tree walk avoided each time a node value is calculated, but evaluating logic equations is also a very 
fast operation for most computers. 

Each step is discussed in a separate section. The first section describes the derivation of logic 
equations for each network node — even those which are not directly outputs of MOS logic gates. The 
second section presents several approaches for building a logic simulator based on the evaluation of 
the node equations. V 
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6.1. Reducing switch paths to logic equations 

The switch-level algorithm in Chapter 5 determines the value of a node from information about 
the node's current connections to vdd and GND. The information is regathered each time a new value 
is calculated for the node. In most cases, only a small number of potential paths exist from a node to 
vdd and GND. This suggests that it might be economical to determine ahead of time the conditions for 
which a path exists to, say, GND. For example, the output of a NOR gate with inputs A and B is pulled 
down if either A or B is non-0. -The existence of a pulldown path can be determined by evaluating the 
expression "A or B"; a search of the network is not required to discover which pulldowns are 
currently conducting. 

This section describes the derivation of a set of four Boolean equations for each node: 

DHa An expression indicating under what conditions a path of conducting n- 
channel and/or p-channel devices exists from node A to VDD. 

DLa An expression indicating under what conditions a path of conducting n- 
channel and/or p-channel devices exists from node A to GND. 

WHa same as DHa , except the path contains at least one depletion device. 

WLa same as DLa, except the path contains at least one depletion device. 
If an expression evaluates to true (1), the corresponding path exists; if the expression evaluates to false 
(0), no path exists. Since nodes can have X values, expressions involving node values can evaluate to 
X; in mis case, the corresponding path may or may not exist The equations involve the ordinary 
Boolean operators and ("•"), OR ("+"), and not ("-»"). These operations are easily extended to 
accommodate X values: 
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The algorithm for constructing logic equations is similar to that for computing the Thevenin 
equivalent for a node (see section 4.1.2). The algorithm begins with an expanding tree walk, stopping 
when an input or dead-end is reached. During the tree walk, all switches are assumed to be on, since 
the tree walk is performed before any node values are calculated. (During simulation, the actual state 
of the switch is represented symbolically in the equation.) The algorithm continues by retracing the 
steps of the tree walk back toward the original node; during this process, the equations are built The 
equations for the terminal nodes are trivial; the following table is the analogue of figure 4.8: 
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terminal node DH DL WH WL 
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Merging the equations for two (or more) paths which join at a given node occurs in several steps. 
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(a) two paths to merge 



(b) after incorporating switches 



(c) final path equations 



Figure 6.1. Merging the equations for two paths which join 

The process begins by modifying the equations for each path to reflect the contribution of the switch 
in series with the path (figure 6.1(b)). The necessary formulas appear below. For example, DH is the 
new equation derived by combining DH with gate, the value of the switch's gate node. 



DH = 



DL = 



DH • gate 
DH • ~>gale 




DL • gale 
DL • -igate 




n-channel switch 
p-channel switch 
depletion switch 

n-channel switch 
p-channel switch 
depletion switch 



(6.1) 



(6.2) 



The equations for the "strong" paths (above) are straightforward; when the connection is made by 
regular switch, the path equation and the the switch's gate value are combined using AND. If the 
connection is made with a depletion device, the strong path is terminated. Equations for "weak" paths 
(below) are slightly more complicated since a depletion switch changes a strong path into a weak one. 
These formulas also reflect the fact that a strong path overpowers a weak path, i&, equations for weak 
paths are forced to if a strong path is present The reason for this extra complication will be clear in 
an example below. 



WH = 



WL = 



gate ■ WH • -yDL 
->gate ■ WH ■ ->DL 
DH + (WH ■ ->DL) 



gate ■ WL ■ -iDH 
->gate ■ WL ■ -iZ)// 
DL +(WL ■ ->DH) 
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n-channel switch 
p-channel switch 
depletion switch 

n-channel switch 
p- channel switch 
depletion switch 



(6.3) 



(6.4) 



After the equations for each path are modified to include the series switches, they are combined (using 
OR) to derive the final equations for the node, as shown in figure 6.1(c). When the analysis for a node 
is complete, the four equations characterize all paths from the node to vdd and GND. 
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(b) network after analysis is complete 



Figure 62. The four equations characterize all paths from node 

In other words, for each node, the surrounding network (figure 6.2(a)) has been reduced to an 
equivalent, but much simple network (figure 6.2(b)). All the information about paths in the original 
network is now stored in the node equations, where it can be efficiently utilized. For example, to 
determine if a node is pulled-down, all one has to do is evaluate the DL equation — no examination 
of the network is necessary. 

The value of node can be determined from the values of the four equations and the node's 
previous value, by table lookup: 
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Table 6.1. Node value table for equation- based simulation 
There are a few special cases which can be summarized more concisely, f For most nodes in nMOS 
circuits, DH = WL = 0, Le,, connections to vdd are made only through depletion pullups, and 
depletion devices are not used elsewhere in the circuit In this case, the value of a node is given by a 
single equation: 



node value — (WH + previous value) • ~*DL (when DH = WL = 0) 



(65) 



Equation 6.5 can be simplified further for a node that is directly pulled up (WH = 1), ie., a node 
which is the output of a logic gate: 



node value = ~>DL (when DH = WL = and WH = 1) 



(6.6) 



In most cases, therefore, calculating the value of a node requires evaluating only a single equation. 

Some examples will help illustrate the analysis. First, consider an inverter with a pass gate 
connected to its output 




Figure 63. Logic equations for output of inverter with series pass gate 



tCurrent hardware simulation engines {Pfister82, Zycad83] implement all functions through table lookup, so they can 
implement the function tabled above as efficiently as, say, Boolean operations. This is not true of most general- 
purpose machines; hence the motivation for finding simpler representations where possible. 
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Using equation 6.5, the value of C is given by C = (B • ->A + C)~*(B ■ A). The value of this 
equation is tabled below for the various values of A and B. 
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When B is 0, the pass gate is turned off, and C retains its old value. When B is 1, the pass gate is on, 
and C is the complement of A. Finally, when B is X, C is also X, except when the output of the 
inverter is the same as the previous value of C. In this case, the output retains its old value, which 
makes sense since there is nothing forcing it to change. This last statement is true only because 
WHq = B ■ -i A ; the -»A term forces the pullup equation to when the pulldown of the inverter is 
active. If the WH equation did not reflect the contribution of the pulldown, La, if WHq = B, the 
value C would be unnecessarily forced to X when the value of B was X. 

The next example is the xor gate presented in Chapter 2. 




O F = A xw B 



Figure 6.4. XOR logic gate 



The equations for each node appear in the following table. 
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These equations might seem incorrect at first — it is not at all obvious that F = AxorB. However 
simplifying the the equations for C and D shows: 

C = ~>(A + DC B) = -*(A + ->(B + C -A)-C ■ B) = ^A (6.7) 

and similarly, D = -if?. These results can be used to rewrite the equation for Fin terms of A and B: 

F = -iE = C-B+DA=-'A'B + -'BA=AXORB (6.8) 

In actual use, the equations are not simplified. The above substitutions do verify, however, that the 
equations compute the correct value for F. 

Some circuit configurations have very simple connection paths during actual operation of the 
circuit, but the circuits can appear very complicated when no information is known about the values of 
various control lines. This is especially true of a circuit containing nMOS switching logic, such as a 
barrel shifter or tally circuit If no information is available about the values of the control lines in a 
barrel shifter, it appears to short together all the incoming and outgoing data bits. The logic equations 
for a node in such a circuit can become very large — in some cases, large enough to be impractical. 
The analysis procedure monitors the size of the equations under construction. If they grow too large, 
the procedure is aborted and the node is flagged. At simulation time, the value of a flagged node is 
determined using the normal switch-level simulation algorithm.! Flagging a small number of nodes 
eases the analysis of the remainder of the circuit (The number of flagged nodes has been less than 
1% of the total number of nodes in all the designs processed to date.) Using this technique, the speed- 
up in simulation afforded by the use of logic equations can be enjoyed by circuits even where 100% 
conversion to equations is not possible. 

Keeping track of gate expressions for transistors crossed during the initial, expanding phase of 
the tree walk allows the equation-building algorithm to eliminate duplicate AND terms in the results. 



fReversion to ordinary switch-level simulation for especially complicated circuits is easily accomplished by general- 
purpose computers, but can be next to impossible for special-purpose hardware. 
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(a) original circuit 



(b) reduced circuit 



Figure 6.5. On- the- fly elimination of duplicate AND terms 

This minor optimization can reduce equation size substantially in some circuits. Consider, for example, 
a tally circuit from [Mead80]. 
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Figure 6.6. Tally circuit 

This tally circuit has three inputs: A, C, and E. A tally circuit counts the number of 1-inputs; ZO = 1 
when no inputs are high, Zl = 1 when exactly one input is high, and so on. The equations produced 
for the outputs appear somewhat complicated, for example: 

DLzi = B(A+DiC+F+EF)+CiD+E+F-E)) + AiB+C+D<C+E+F-E)) (6.9) 



WHzi = BiDE+CF+ACE) + A-D(F+C(E+B-E)) 



(6.10) 



These equations are hard to verify as they are, but they can be simplified by removing B, D, and F. 
(Again, the simulator does not simplify the equations, but this is the easiest method for us to use to 
verify the operation of the algorithm.) Using the identities B = -M, D = ~«C, and F - -»£, the 
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equations reduce to: 

DLz\ = ->A->C--iE + -iA-C'E + AC + A-~yC-E (6.11) 

WH Z \ = -lA—tCE + ~>A-C->E + A-->C--iE (6.12) 

Substituting these formulas into equation 6.5 gives 

Z\ = A--*C-->E + ->A-C~>E + -i/J-iC-E (6.13) 

As expected, 21 is true if exactly one input is high. Of course, evaluating this last equation would be 
much faster than using the original equations, 6.9 and 6.10. Unfortunately, equation simplification is a 
very time consuming operation; the computational investment required to process all the equations for 
a large circuit would probably not be recovered by decreased simulation time. In addition, the 
equations for most nodes are simple, and simplification beyond that suggested by equation 6.6 (a 
simplification which is easily recognized) does not result in much improvement 

6.2. Compiling logic equations for simulation 

It is easy to build a simulator that uses the node equations developed in the previous section. 
The simplest approach [Denneau82] is to allocate two node- value arrays; one to hold the current 
values of each node, and die other to collect new node values as they are computed. Each node is 
assigned an index which can be used to access its current value in the first array, or to store its new 
value in the second array. A simulation subroutine for the network is built by generating code that 
calculates the value of each node, where the code for one node is followed by the code for the next. 
(Since new node values are kept separate from the current node values, the order in which nodes are 
processed by the compiler does not matter.) A single simulation step, which propagates new input 
values to other nodes in the network, is implemented as follows: 

(1) For each input node, set its current-value array entry to the designated input 
value. 

(2) Execute the simulation subroutine. This fills the new-value array. 

(3) Compare the current-value and new-value arrays. If their contents are identical, 
the network has settled and the simulation step is over. Otherwise copy the 
new-value array to the current-value array, and return to step (1). 

This simulation algorithm has several interesting properties. Each execution of the simulation 

subroutine corresponds to one step of a unit-delay simulator. Node values are updated all at once in 
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step (3); hence, the simulator implements a true unit-delay algorithm as described in section 5.3.3. 
Note that no special handling of input nodes is required when generating code — the new values 
calculated for input nodes in step (2) are overridden by user-specified values in step (1). Note also 
that the calculations of the simulation subroutine are not event driven; the implications are discussed 
below. 

The value of a node is computed from its four node equations, using the code generated by one 
of the following alternatives: 

(1) If DH = W L = and WH = 1, emit code that calculates the node value using 
equation 6.6. 

(2) If DL = WL = and WH * 1, emit code that calculates the node value using 
equation 6.S. 

(3) Otherwise, emit code which evaluates each of the four node equations, and then 
concatenates the resulting values with the previous value of the node to create an 
index into Table 6.1. As an optimization, the code generator can check for other 
special cases (constant values for WH and WL) and generate accesses to smaller 
tables if appropriate. 

Code is generated for each equation using standard compilation techniques. The logic instructions of 

the target machine are used for expression evaluation. (Some provision must be made to incorporate 

X values in a way that still permits use of the native logic instructions; see the example at the end of 

this section.) Access to a node's current value requires only an indexed reference into the current-value 

array; storing generated values requires an indexed reference to the new-value array. 

There are some inefficiencies inherent in this approach. An extra execution of the simulation 
subroutine is performed during each simulation step — "extra" in the sense that the last execution 
produces the same result as the one before (that is how the simulator identifies it as the last 
execution). In addition, the value of each node is calculated during each call to the simulation 
subroutine, even if the inputs to the node's equations have not changed. 

This last objection can be addressed by making a more intelligent choice about the order in 
which node values are calculated, by identifying the nodes that afreet node A's value (Le., nodes that 
appear in the equations for A) and then generating code to compute the values of these nodes before 
generating code to compute the value of A [Case78, Denneau82]. In addition, references to a node's 
current value are directed to the new-value array if a new value for the node was computed earlier in 
the subroutine. For example, the circuit in the following figure has several cascaded logic gates. 
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Figure 6.7. Cascaded logic gates 

Under the new organization, the compiler generates code for nodes A and B before generating code 
for node E, and so on. The resulting code propagates a new input value from A to H in a single 
execution. (The earlier scheme would have required three calls to the simulation subroutine to achieve 
the same effect) 

To implement this scheme, the compiler assigns a numeric level to each node. The level of input 
nodes is defined to be 0; the level of a non-input node a is 

level(a) = 1 + max( level of nodes affecting a ) (6-14) 

Referring to the example in figure 6.7, if nodes A through D are inputs, level(E) = 1 and 
level(H) = 3. Code is first generated for level 1 nodes, then level 2 nodes, and so on. When 
compiling an equation, if a node value is needed, the node's level determines where that value comes 
from. The value of a level node is taken from the current-value array, and the value of a node with 
a level greater than is taken from the new-value array. (New values are stored in the new-value 
array, as always.) 

The definition of a node's level in equation 6.14 runs into some difficulty if the circuit has 
feedback. Consider, for example, the following circuit: 



+ ¥L 
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Figure 6.8. Circuit with feedback 

In attempting to assign a level to node K, one discovers that the definition is circular, ic, the level of 
node K is defined in terms of itself. The compiler solves this problem by arbitrarily splitting a node 
that is in the feedback loop into two nodes. One copy is treated as an input, and the other as a 
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normal network node. Both are assigned the same index so that the input value is updated each time 
the new-value array is copied to the current-value array. Thus, the circuit in figure 6.8 is compiled as 
if it had the following configuration: 

value fed back during step (3) 

y 

Figure 6.9. Feedback circuit as it appears to the compiler 

For the purposes of compilation, the feedback loop is broken; the value is actually fed back during 
step (3) above when the new- value array is copied to the current-value array. This means that a 
circuit containing feedback might require more than a single execution of the simulation subroutine 
before the network settles. As it turns out, most MOS circuits contain feedback loops since charge 
decay requires that storage nodes be refreshed. A clocked feedback loop offers special compilation 
opportunities, which are discussed below. 

Compiling nodes by level ensures that only a single execution of the simulation subroutine is 
needed to settle the network, assuming the network contains no feedback. The new organization 
introduces other differences from the original compilation strategy. Node values are not updated all at 
once in this scheme; the simulation subroutine implements a pseudo unit-delay simulation. Input 
nodes must be assigned a level of 0, which means nodes must be declared as inputs before the 
compilation process begins. This eliminates the possibility of interactive debugging, where one wants 
the capability to consider any node as an input Typically, the designer uses the original compilation 
strategy when initially checking out the circuit, and then uses compilation-by-level when performing 
long verification runs. 

Most node-value references are satisfied using the new-value array in the compilation-by-level 
scheme. This suggests that is might be worthwhile to eliminate the storage overhead and copying time 
involved for managing two arrays by merging them into a single array. This is straightforward, 
provided a new technique is developed for detecting when the simulation step is complete. If the 
circuit has no feedback, only a single execution of the code is needed. When there is feedback, a 
single execution also suffices, if the current and new value of split nodes (eg., K and K in figure 6.9) 
agree. Only when the old and new values are different is another execution required. This can be 
arranged by comparing the two values before the new value is stored into the array. If the 
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comparison shows them to be unequal, a flag is set to indicate that another execution is needed. Note 
that the whole simulation subroutine is re-executed; this is simpler than trying to untangle interlocking 
feedback loops to determine the subset of the code that must be re-executed. 

With this improvement, the compilation-by-level scheme produces a simulation subroutine that: 

(i) uses a single node-value array. 

(ii) evaluates nodes in a reasonable order: the values of a node's inputs are 
calculated before the value of the node itself is calculated. 

(hi) deals with feedback by splitting some node in the feedback loop into an input 
node (assigned level 0) and a regular node. Both nodes are assigned the same 
index, so when the value of the regular node is recomputed it updates the value 
of the input node also. Before storing the value of a split node into the node- 
value array, it is compared with the current value; if the values are different a 
flag is set 

(iv) uses the flag described in step (iii) to indicate when another iteration is needed. 
If the flag is set during an execution of the code, another iteration is performed; 
otherwise, the subroutine is finished. 

The following is an extended example which illustrates the result of a compile-by-level for a single bit 

in a nMOS counter. The circuit diagram for the counter bit is shown in the following figure. 



PHI2 



IN* 



ON* 




OOUT 



* »OTJT 



Figure 6.10. Circuit diagram for a one-bit counter 



The target machine for this example is the DEC VAX-11. A node value is 2-bit quantity (logic low = 
0, logic high ss 3, X = 1) stored in a byte location; the node-value array is implemented as an array 
of bytes. Logical ANDf and OR instructions produce the desired answers with this value encoding. 
However, using this encoding, the complement instruction does not correctly implement the NOT 



tThe VAX does not, in fact, have an AND instruction. Instead, a "bit clear" (BIC in VAX parlance) is provided, 
which implements an AND-COMPLEMENT operation. This introduces a few circumlocutions in the generated code. 
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operation, so not is performed by table lookup. The index of each node is indicated symbolically in 

the code below (the index of node A is written "uA"). 

rlO = pointer to value array 

ntbl = table giving NOT of value 

xtbl = table giving bit complement of value 

xntbl = table giving bit complement of NOT of value 



so regs can be used Index registers 
nonzero Indicates no Iteration needed 

rO = lph12 + out = 1(ph12 • lout) 



rl = (ph12 *out) + 1n 
1n = r0» rl 



a= Mn 



step: 




clrl 


rO 


clrl 


rl 


movb 


#1, iterate.flag 



1; movb PHI2(rlO),rO 

b1sb3 ntbl(rO),_OUT(rlO),rO 

movb OUT(rlO),rl 

b1cb3 xtbl(rl),_PHI2(rlO),rl 

b1sb2 IN(rlO),rl 

b1sb3 xtbl(rl).rO._IN(rlO) 



movb 
movb 



_IN(rlO),pO 
ntb1(rO),.A(rlO) 



movb _PHIl(rlO),rO 

b1sb3 ntbl(rO),_A(rlO),rO 

movb . A(rlO),rl 

b1cb3 xntbl(rl),_PHIl(rlO).rl 



rO = Iph1l + a = l(phil» la) 



b1sb2 
b1cb3 


B(rlO).rl 
xtbl(rl),rO._B(rlO) 


; rl = <ph1l • la) + b 
; b = rO • rl 


movb 
movb 


B(rlO).rO 
ntbl(rO),.C(rlO) 


; c = lb 


movb 

b1cb3 

movb 


C(rlO),rO 

xtbl(rO),.CIN(rlO),rO 
ntbl(rO)._D(rlO) 


i d = I(c»c1n) 


movb 

b1cb3 

movb 


C(rlO),rO 

xtbl(ro),_D(rlO).rO 
ntbl(rO),_E(rlO) 


; e = t(c»d) 


movb 

b1cb3 

movb 


D(rlO),rO 

xtbl(rO),_CIN(rlO),rO 
ntbl(rO),_f(rlO) 


; f = l(d«dn) 


movb 
movb 


D(rlO),rO 
ntbl(rO)..COUT(rlO) 


; cout = Id 



movb E(rlO),rO 

b1cb3 xtbl(rO), F{rlO),rO 

cmpb ntbl(rO),.OUT(rlO) 

beql 2f 

movb ntbl(rO)..OUT(rlO) 

clrb Iterate.flag 



2: 



bbcs 
rsb 



#1, Iterate.flag, lb 



check I (e * f ) against old value 

1f different, save new value 

and set iterate flag so we do 1t again 



check flag, Iterate if set 



The code is a relatively straightforward implementation of the equations for each node. Nodes PHU, 
PHI2, and CIN are designated as input nodes. Note that the feedback loop is broken by splitting 
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node OUT, an arbitrary choice. The resulting simulation is several orders of magnitude more efficient 
than a standard switch-level simulation. For example, the value of B is calculated in six instructions; 
the value of C in only two. The code is also relatively compact compared to the usual network data 
base. 

Although compiling by level greatly reduces the amount of wasted computation, there are still 

occasions when the values of nodes are unnecessarily calculated. Some input transitions have little 

effect on node values; eg., when PHI1 or PHI2 in the one-bit counter above change from 1 to 0. 

This suggests that the performance of the simulator can be improved by generating multiple simulation 

routines, where each routine corresponds to a fixed value for one or more inputs. This is particularly 

advantageous when the inputs selected for special processing have a major impact on the circuit to be 

simulated. For example, in a circuit using two clocks, three separate simulation routines can be 

generated: one generated assuming both clocks are low (called, say, CLOCK00), and the other two 

generated assuming one of the clocks was high (CLOCK10 and CLOCKOl). A four-phase clock cycle is 

simulated by executing the simulation subroutines in the correct order: 

PHI 1 high 
both clocks low 
PHI2 high 
both clocks low 

To generate a input-specific simulation routine, the user specifies which nodes are inputs, and for each 

input 

(1) gives the input's logic value, and 

(2) indicates whether the input is stable or has just changed to the specified value. 

The compiler applies several optimizations during code generationf: constant folding based on 
knowledge of input node values, and compile-time selective trace mat ignores nodes whose values 
remain unchanged. (The stable/changing specification is used by the selective trace optimization.) The 
selective trace is especially effective in reducing the amount of generated code. 

In the examples below, PHI I and PHI2 are specified as changing inputs, and CIN an 
unchanging input The first example — the code generated for the one-bit counter with both clocks 
low — illustrates just how effective the optimizations can be: 



jsb 


clocklO 


jsb 


clockOO 


jsb 


clockOl 


jsb 


clockOO 



fThe optimizations are inspired by those found in traditional optimizing compilers [Harrison77, Wulf75]. Because of 
the branch-free nature of the code and the pervasive influence of dock signals, many of the optimizations are much 
more effective in this domain than in traditional compilation problems. 



clockOO: 

clrb PHIl(rlO) 
clrb _PHI2(rlO) 
rsb 
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code forphll = 0, ph12 = 0, c1n 
phll =0 
ph12 = 



The values of PHI1 and PHI2 are set by the code since they are specified as changing inputs. (The 
value an unchanging input is assumed to be set by the user, or by code executed earlier.) Node B is 
determined to be unaffected by the change in PHI1, as are nodes IN and PHI2. In fact, the compile- 
time selective trace does not find any nodes that change value, except for the changing inputs. 

The next code sequence, corresponding to PHU going high, is somewhat longer, since that is the 
transition when the circuit performs most of its work. 



clocklO: 




; code for phll = 1, ph12 = 0, dn = 1 


clrl 


rO 


; so reg can be used as Index register 


roovb 


#3, PHIl(rlO) 


i phll = 1 


clrb 


PHI2(rlO) 


; ph12 = 


movb 


A(rlO), B(p10) 


; b = a 


movb 


B(rl0),rO 




movb 


ntbl(rO). C(rlO) 


j c = lb 


movb 


C(rl0).r0 




movb 


ntbl(rO), D(rlO) 


; d = I(c*c1n) = Ic 


movb 


_D(rl0),r0 




movb 


ntbl(rO), COUT(rlO) 


; cout = Id 


movb 


C(rl0).r6 




b1cb3 


xtbl(rO). D(rl0),r0 




movb 


ntbl(rO). E(rlO) 


; e= !(C»d) 


movb 


D(rl0),r0 




movb 


ntbl(rO), F(rlQ) 


; f = !(d*c1n) » Id 


movb 


E(rl0),r6 




b1cb3 


xtbl(rO), F(rl0).r0 




movb 


ntbl(rO), OUT(rlO) 


; out = !(e*f) 


rsb 







A node that connects to the rest of the network through a single pass transistor (e&, node B in the 
counter) is treated specially by the compiler, because such nodes are so common in mos networks. 
When the pass transistor is turned on by fixed-value input, me generated code is particularly efficient 
(a single move in the example above). 

The last code sequence, corresponding to PHI2 going high, is relatively short; the compile-time 
selective trace finds only a few nodes whose values needed to be computed. 



clockOl: 




; code forphll = 0, ph12 = 1, dn = 1 


clrl 


rO 


; so reg can be used as Index register 


clrb 


PHIl(rlO) 


; phll » 


movb 


#3. PHI2(rlO) 


; ph12 = 1 


movb 


OUT(rlO), IH(rlO) 


; in = out 


movb 


IM{rl0).r0 




movb 


ntbl(rO). A(rlO) 


; a = Mn 


rsb 







Simulation of a fou^phase clock cycle using these three routines requires executing only 36 VAX 
instructions. The earlier compiled code sequence requires 39 instructions for a single simulation step. 
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for a total of more than 150 executed instructions when simulating a full clock cycle. Input-specific 
subroutines result in a considerable improvement 

Although the impact of compile-time selective trace makes it a worthwhile optimization, only so 
many input-specific routines can be generated. Assuming that all combinations of inputs are possible, 
the number of routines needed grows exponentially with the number of fixed inputs. Thus, while 
computations caused by the changing of a few inputs can be reduced to the bare minimum, many 
unnecessary computations are still performed. For example, in a 10-bit counter, the nodes comprising 
the higher data bits are recomputed during each clock cycle, even though those nodes actually change 
value far less frequently. Presumably, the appropriate checks could be inserted into the code, resulting 
in branches around sections of code that do not need to be executed. In the counter example, when 
the carry-in of a data bit is zero, the code for its level and all higher levels does not need to be 
executed. However, a very sophisticated compiler would be needed to handle this situation. It is 
unclear what further gains will be possible in the search to reduce unnecessary computation. 

In summary, the compilation techniques discussed in this chapter are well-suited for producing 
code that implements a fast switch-level simulation of a stable design. The potential increase in 
simulation speed allows more exhaustive checkout than is possible with interactive (and slower) 
simulators. Compilation-based simulation is most appropriate for a circuit with a high degree of circuit 
activity; if each circuit component is active during each simulation step, there is very little unnecessary 
computation by the simulation subroutine. On the other hand, for a large circuit with little activity, an 
event-driven interactive simulator might actually outperform a compiled simulation. Fortunately, not 
many designers strive for designs in this latter category. 
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CHAPTER SEVEN 



CONCLUSIONS 



The models and simulators presented in this thesis were developed to fill the need for simulation 
tools suitable for large MOS designs. At the outset of the project, there were surprisingly few 
alternatives; even today, much of the work in the area of simulation tools concentrates on refurbishing 
traditional gate-level simulators and circuit analysis programs. (The current state of these efforts is 
outlined at the end of the chapter.) The work reported here takes a different approach, seeking to 
develop new algorithms, guided by the following goals: 

(1) The algorithms must be suitable for the logic-level simulation of large digital MOS 
circuits; "large" meaning circuits containing 10,000 to 50,000 transistors. 

(2) Important aspects of MOS behavior (bidirectionality, charge sharing/storage, 
pullup/pulldown ratios, etc.) should be modeled in a useful way. 

(3) Performance estimates should be calculated directly from the actual parameters 
of die circuit components. Ideally, the calculations are based on the same rules 
of thumb used by designers when estimating circuit performance. 

The RSIM simulator meets all three goals, while maintaining a reasonable balance between simulator 

performance and accuracy of predictions. Rather than performing a detailed simulation of each 

transistor's operation, RSIM uses the linear model to directly predict the logic state of each node and to 

estimate transition times when nodes change state. The net effect is a trade of some prediction 

accuracy for an increase in simulation speed. When the linear model is conservatively calibrated, its 

predictions can be used to identify problem circuits in need of more accurate analysis. Usually, a large 
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percentage of a circuit passes the scrutiny of RSIM, and so the expense associated with detailed 
simulation of the whole circuit is avoided. In addition to serving as the basis for simulation, the linear 
model can be used in timing analysis and might serve to quickly generate initial waveforms for a 
relaxation-based circuit analysis program. 

RSIM has been in use in both university and industrial environments since the spring of 1982. 
During that time it has simulated several hundred designs, ranging in size from very small to 
approximately 40,000 transistors. Because RSIM is fast enough to simulate a whole circuit, it often 
uncovers circuit flaws that have fallen between the cracks during the simulation of smaller pieces of 
the design. The trend shows that RSIM is viewed as a companion to circuit analysis, using it for all 
logic-level verification and preliminary timing analysis, and resorting to circuit analysis for those paths 
identified as critical by RSIM. 

The simulation algorithm is embedded in a liSP-like command language |Terman82] that has 
been used to write quite elaborate programs to drive the simulation and process the results. Since 
programs to prepare simulation input are much less tedious to construct than the input itself, designers 
have been able to conduct more tests than they might otherwise do. For example, it is a simple matter 
to use a set of test vectors that drive a register-transfer-level simulation as input to an rsim run, and 
compare the predictions of the two simulations, all under program control. 

With careful calibration, RSiM's predictions for combinational logic are within 30% of those of 
spice For circuits relying on analog behavior (sense amplifiers, bootstrapped nodes, etc.) or chains of 
pass devices, the predictions are less accurate. To compensate, several "escape" mechanisms exist 
which allow the designer to specify the logic thresholds and transition times for individual nodes so 
that the results of more detailed simulations can be incorporated into RSIM. Usually this mechanism 
need be invoked for only a few critical nodes (e.g., clock driver outputs). Another alternative is to 
identify subcircuits and replace them with logically equivalent circuits that can be simulated easily; a 
network preprocessor [Iler83] that performs subcircuit matching and replacement is available and has 
been used to good effect. With these enhancements, RSIM has proved to be a fairly reliable filter for 
detecting circuits in need of more careful analysis. 

For those stages of the design process that do not require performance information, a switch 
model might be more appropriate than a linear model. A switch-level simulation is particularly useful 
in the early stages of a design when one is experimenting with the organization of the logic, and sizing 
each device would be distracting. The switch models presented in this thesis are straightforward, 



-138 



especially in the treatment of X values and their effect on the network. The switch model as 
embodied in ESIM (which uses the global algorithm outlined in Chapter 5) is quite compatible with the 
linear model used in RSIM. In fact, in the current implementation both models exist side by side and 
one can choose either model when propagating a set of changes through the network. This flexibility 
is useful during initialization of a network, when performance information is not a major concern. 

Simulator performance is always an important issue, one that has been addressed throughout the 
thesis. Chapter 4 describes several techniques for speeding up the RSIM algorithm; using a compressed 
representation of logic gates and caching subnetwork calculations decreases the execution time of rsim 
by a factor of two or more. The local switch algorithm presented in Chapter 5 is ideal for 
implementation on parallel architectures. Like many relaxation algorithms, it can effectively utilize 
many processors, and so holds the promise of large performance improvements in simulation when 
parallel processors move out of the experimental stage. A different approach for improving the 
performance of switch-level simulation is described in Chapter 6, which proposes performing the 
network analysis once, before simulation, and using the results to compile a set of logic equations for 
each node. When evaluated in the proper order by a conventional computer, the resulting switch-level 
simulation is many times faster than simulation using traditional techniques. The node equations can 
also be used to develop instruction sequences for special-purpose simulation hardware — e.g., the 
Yorktown Simulation Engine, or the Zycad multi-processor — extending the benefits of high-speed 
gate evaluation to arbitrary MOS networks [Barzilai83]. 

The remainder of this chapter discusses other work in the area of simulation related to the topics 
of concern in this thesis. These topics include: 

• algorithms for fast circuit analysis; circuit analysis using simplified models 

• mixed-mode simulation 

• logic-level simulation using pre-determined transition delays 

• models for estimating circuit performance 

• other switch-level simulation algorithms 

Each of these areas is discussed below. 

The most detailed and accurate network simulation is provided by circuit analysis programs, such 
as ASTAP [Weeks73] or spice rNagel75]. The capacity limitation of circuit analysis is a prime motivation 
for the development of simpler simulation models; recent improvements in circuit analysis algorithms 
are making inroads into the traditional performance problems of circuit analysis. Device models are 
the heart of a circuit analysis program. The models are usually analytic; they contain formulas that 
predict device performance from information about voltage histories, physical properties of materials, 
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etc. In a circuit, the behavior of a particular device might be determined by several electrical nodes 
which, in turn, are affected by other devices; te, a system of circuit equations is needed to describe 
the behavior of the circuit as a whole. To make rinding a solution computationally feasible, most 
circuit analysis programs proceed in two steps: 

(1) The circuit is partitioned so that, at a particular time step, the change in voltage 
on each node is approximated as a linear function of the node voltages (and their 
derivatives). It is during this step that device models must be evaluated. 

(2) Solving the resulting set of equations numerically (see [SV80]). 

These two steps can be quite time consuming, although for large circuits the second step becomes the 
dominant factor [Newton80]. RSIM reduces both costs by using a very simple device model whose 
effects can be predicted without the need for expensive numerical techniques. 

The cost of model evaluation can be reduced by replacing the analytic device models with tables 
relating device current to terminal voltages [Chawla75, Fan77]. These tables can be derived from a 
one-time evaluation of the original analytic models, or filled directly from device measurements. In 
these simulators, the current charging/discharging of each node capacitor is determined from the 
present node voltages; thus, the change in node voltage for each time step can be calculated directly 
and the cost of solving a set of simultaneous equations is avoided. Another approach to reducing the 
cost associated with dealing with large matrices of equation coefficients uses a relaxation technique 
[Lelarasmee81, Newton83J to successively approximate the voltage waveform for each node in the 
circuit The solution for each node is computed separately, using the estimates of other node voltages 
computed during earlier iterations. Again, this avoids the cost of solving a large set of simultaneous 
equations. It is also possible to skip the recalculation of a node's waveform during a particular 
iteration if it can be determined that the estimates for the surrounding network have not changed 
substantially since the last iteration (Le., selective-trace comes to circuit analysis). These techniques can 
speed up circuit analysis hy an order of magnitude or more, but the programs are still limited to 
circuits with a few thousand components. 

Recent work on simulators has tried to combine the computational advantages of gate-level 
simulation with the precision afforded by circuit analysis; this has lead to a new family of mixed-mode 
simulators: [Chen78, Gardner79, Hill79, Agrawal8Q, Newton80]. The designer can specify gate-level or 
functional simulation for simple or previously-verified pieces of the circuit, reserving the expense of 
circuit analysis for critical sections of the design. There are two problems that remain to be solved in 
mixed-mode simulators: conversion between the different representations of node values used by the 
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different levels, and the related problem of choosing which type of simulation should be used for each 
subcircuit The designer can introduce errors into the simulation by an unfortunate choice of level at a 
critical point in the circuit; special care must be exercised to avoid discontinuities and other pitfalls of 
the numerical solution techniques. Like circuit analysis, mixed-mode analysis still requires the touch of 
an expert lest it produce misleading results. 

Clearly, it is only a matter of time before mixed-mode simulation becomes true hierarchical 
simulation in which the results of detailed low-level simulation are automatically summarized for use in 
subsequent high-level simulations. A hierarchical system would also decide what level of simulation is 
appropriate for each subcircuit Viewed in this light, RSIM can be thought of as the first step toward 
automatic identification of critical subcircuits. With a foot in both worlds, RSIM provides an easy path 
for descending into circuit analysis or for abstracting toward higher-level logic functions. 

Another approach to timing simulation that retains the speed advantages of gate-level simulation 
is determining the transition delays for each node before simulation begins. Some gate-level 
simulators [Szygenda72, Case78] allow the user to assign node delays. This type of simulator can be 
extended to handle MOS networks, after a fashion [Sherwood81, McDermott82]. The result is a system 
that can quickly calculate estimates for signal delays in a network. Unfortunately, the delays are not 
calculated automatically (and hence are prone to error or wishful thinking on the part of the designer), 
and are approximate at best for pass transistor circuits so common in MOS circuits. A more effective 
technique for pre-computing delays is the use of the results of actual measurements or circuit analysis 
runs [Pilling73, Nahm80]. The delays are measured/calculated for "standard" gate configurations, and 
the results used to estimate the performance for the actual configuration of each node in the network. 
[Nahm80] mentions several shortcomings of this approach. Circuits with multiple inputs are difficult to 
analyze since a particular input transition is chosen when performing the analysis; also, the effect of 
overlapping input transitions, the slope of the input waveform, and dynamic changes in the output 
load are not considered. (Interestingly, all these problems are solved in a straightforward way by RSIM, 
at no great loss in execution speed, as evidenced by the performance figures quoted by Nahm.) 
[Okasaki83] suggests overcoming these problems by expanding the set of "standard" configurations to 
include most of those commonly found in MOS circuits (complex and/or gates, pass gates, etc.). The 
price for the increase in accuracy is a corresponding increase in the complexity of the model for each 
gate; his simulator spends a fair amount of time determining which pre-computcd delay should be 
used, given the current configuration of the network. In summary, the performance variations 
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introduced by non-standard circuit configurations, and changes in the network due to changing node 
values seem to offset any advantages offered by pre-determined transition delays. 

Not much has been published about models that are suitable for quickly determining the 
transition times for particular network configurations. A switched linear Thevenin model is described 
in [Glasser80]; a simulator based in part on this model is described in [Tamura83]. Multiple resistances 
are used to describe each transistor; conceptually, the appropriate resistance is selected by a rotary 
switch controlled by the transistor's gate voltage. Each resistance is chosen to model the actual 
channel resistance in a particular region of device operation. Hie linear model presented in this thesis 
can be viewed as a simplification of Glasser's model, with only two possible switch positions selecting 
between resistances of Rtf and co. A simple version of the linear model also appears in 
[Ousterhout83] and [Jouppi83]; both indicate that the model improvements suggested in Chapter 3 are 
needed in order to improve prediction accuracy. [Horowitz83J presents a simple model that describes 
the performance of a network of pass gates; his model is discussed in section 3.S. 

One simulator with many of the same aspirations as the switch-level simulators described in 
Chapter S is MOSSIM, written by Randy Bryant [Bryant81]. MOSSIM uses a switch transistor model 
similar to that presented here, but its calculations are organized differently since (1) node values are 
represented using a cross-product value set and (2) the analysis is based on a static decomposition of 
the network. A major difference in the simulation calculation comes in the handling of X values and 
their effect on the surrounding network. Bryant handles such values in a separate stage of the 
computation, using global knowledge of the network configuration to resolve values of subnetworks 
connected by X transistors. (Other differences between the two approaches are discussed in Chapters 
2 and 5.) The extra complexity of his algorithms results in some degradation in simulator performance 
over that achieved by the simulators described here. 
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APPENDIX ONE 



Proof of Lemma 5.3 



Lemma 53. Let W, X, and Y be network states. If W -*\ X and W -*i Y, then 
there exists a network state Z such that X -* Z and Y -* Z. 

Recalling how the update operation works, it is not hard to believe that the Lemma is true. The value 
of a node indicates the resistance of paths from the node to VDD and GND. An update exchanges path 
information across a switch, and the U operation ensures that information is never lost (the indicated 
resistance to an input can never increase). Intuitively, an update only adds information about possible 
paths to the network state, so no matter what switch is chosen for an update, one can also go back to 
other switches latter on. 

The proof is straightforward, demonstrating how a state Z can be constructed for each possible X 
and Y. The proof depends on some simple properties of the U operation and the switch function: 

A U A = A 

A U B = BU A 

a U switchia, a) = a (Al.l) 

swilch(o> switch(o, a)) = switch(o, a) 

switch{a, a U B) = switch(a, a) U switchip, B) 

which can be verified directly from the definition of U and equation 5.9. 

If the two updates leading to states X and Y involve only one switch, X = Y and the Lemma is 
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trivially true. If two separate switches are involved, there are three cases to consider which differ in 
the number of nodes affected. 



A 1 gj I B 

C J a^ D 



"•> 



C A 





a l 

°2 





(a) Case 1 



(b) Case 2 



(c) Case 3 



Figure Al.l. Three cases in proof of Lemma 5.3 



For notational convenience, define the functions /and g to describe the effects of switch 1 and 2 
respectively: 



f(a)sswitch(ai, a) 
g(a)sswitch(o2, a) 



(A1.2) 



Each of the two updates is labeled by the switch it operates on; for example, S\ refers to an update 
involving switch 1. A sequence of updates is written as SjSj, which is taken to mean update £,-, 
followed by update Sj. 

Case 1: no nodes in common. As the following diagram indicates, when the updates have no nodes in 
common, they result in the same state when applied in either order. 



w 

SI / \ S2 



S2 



SI 



Figure A1.2. Slate diagram when no nodes in common 
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This is shown by considering the values for nodes A, B, C, and D after each update: 



sequence 


A 


B 


C 


D 


s, 


AU (IB) 


BUHA) 


C 


D 


s,s ? 


A U «B) 


BUW 


C U g(D) 


DUg(Q 


s, 


A 


B 


CUg(D) 


DUg(C) 


s* 


AUfl[B) 


BUf(A) 


CUg(D) 


D U g(C) 



The final states of the two sequences are the same, demonstrating the desired network state, Z. 

Case 2: one node in common. As the following diagram indicates, when the updates have one node in 
common, S1S2S1 is equivalent to S2S\S2- 



w 
si y \ S2 



S2 



SI 



SI w S 2 



Figure A1.3. Slate diagram when one node in common 
This is shown by considering the values for nodes A, B, and C after each update: 



sequence 


A 


B 


C 


s n 


A U fl[B) 


B U fl[A) 


C 


SiS, 


AUP 


B U fl[A) U g(Q 


C U g(B U HA)) 


S 1 S 2 S 1 


AUHB)URBU f(A) U g(Q) 


B U fl[A) U g(C) U RA U RB)) 


C U g(B U f(A)) 


s, 


A 


BUg(C) 


C U g(B) 


S* 


A U f(B U g(Q) 


B U g(C) U fl[A) 


CUg(B) 


S ? S 1 S ? 


AU«[BU g(C)) 


B U g(C) U «(A) U g(C U g(B)) 


C U g(B) U g(B U g(C) U «A)) 



Using the identities in equation Al.l, the final values of A, B, and C for each sequence can be 
simplified to 

Af inal = A U/(fi)U/(g(C)) 

K final = BUg(C)Uf(A) (A1.3) 

Cfinal = CUg(B)Ug(f(A)) 

The final states of the two sequences are the same, demonstrating the desired network state, Z. 

Case 3: two nodes in common. As in Case 1, when the updates have no nodes in common, they result 
in the same state when applied in either order. This is shown by considering the values for nodes A 



and B after each update: 
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sequence 


A 


B 


St 


A U RB) 


BUfl[A) 


S 1 S ? 


A U HB) U g(B U f(A)) 


B U f(A) U g(A U fl[B)) 


S, 


AUg(B) 


BUKA) 


s i s ? 


AUPU g(A)) 


B U g(A) U RA U #B) 



Again, using the identities in equation Al.l, the final values of A and B for each sequence can be 
simplified to 



Ajm = A Uf(B)Ug(B) 
B fiml = BUg(A)Uf(A) 



(MA) 



The final states of the two sequences are the same, demonstrating the desired network state, Z. I 
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APPENDIX TWO 



RSIM Calibration Tables for a 5/t nMOS Process 



RSiM's transistor model relies in part on three modeling resistances for each transistor in the 
network: 

R static for calculating ¥,),„, 

Rdynlow for calculating the transition time for high-to-low transitions, and 
Rdynhigh for calculating the transition time for low-to-high transitions. 
These resistances are chosen for each transistor on the basis of its geometry, type, and usage in the 
circuit. The static resistance is chosen to obtain a good prediction for the 0-output voltage of a logic 
gate. Actually this constrains only the ratio of the n-channel and pullup static resistances, so there is 
considerable freedom in choosing these values. 

The dynamic resistances for each transistor type are specified in the following diagram. Because 
of their special nature, depletion devices configured as pullups.are treated separately from other 
depletion devices. 
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transistor type 


dynlow 


D 

dynhigh 


n-channcl 


Tables A2.1 & A2.2 


Table A2.3 


depletion 


(see text) 


Table A2.4 


pullup 


— 


Table A2.5 



The tables appear at the end of this appendix. Rdynlow is not needed for a pullup, but might be 
needed for other configurations of depletion devices (eg., if one appeared in a pulldown path). If 
desired, a very high Rdynlow can be specified for depletion devices to flag the use of a depletion device 
in a pulldown path. 

The tables below were prepared by analyzing the simple SPICE experiments proposed in section 
2.4. As mentioned in that section, more sophisticated experiments might be more appropriate for 
designers who wish to push rsim to its limits. These tables are used by examples in the thesis; for 
actual simulation, some of the values should be derated (increasing the resistance) to ensure 
conservative estimates. 

The experiments were run using version 2G.5 of SPICE with the following device models (a 
typical Sfi nMOS process): 

MODEL ENH NMOS (LEVEL = 2 VTO=1.0 PHI = 0.56 GAMMA = 0.4 CGSO=4.5E-10 PB=0.86 
JS = 1E-18 CJ = 7.2E-5 CJSW=3.6E-10 T0X = lE-7 NSUB = 1.0E15 XJ = lE-6 LD=0.7E-6 
UO = 690UCRIT = 1E5UEXP=0.12MJ=0.6MJSW=0.27) 

MODEL DEP NMOS (LEVEL = 2 VTO=-3.3 PHI=0.55 GAMMA=0.47 CGSO=4.5E-10 PB = 0.85 
JS = 1E-18 CJ = 7.2E-5 CJSW=3.6E-10 T0X = lE-7 NSUB = 1.0E15 XJ = lE-6 LD=0.7E-6 
UO=690UCRIT = 1E5UEXP = 0.12MJ = 0.5MJSW=0.27) 

Rise time is measured as the length of time needed for an output to rise from volts to 2.134 volts — 
the balance point of a 4:1 inverter built using this process. (Section 3.3.1 explains why the balance 
point is chosen as the threshold.) Fall time is the length of time needed for an output to fall from 5 
volts to the threshold. 

Note that widths and lengths are shown in microns, and the table values are in units of KQ per 
square of channel; one must multiply the appropriate table entry by the number of squares of channel 
(length-*- width) to get a transistor's resistance. For table entries marked "*", no value is available 
because of a spice bug. 
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p 




Length 


K enh 


5 


10 


20 


30 


40 


50 


100 


Width 


5 


8.7 


13.6 


16.2 


17.1 


17.5 


17.8 


18.4 


10 


8.8 


13.7 


16.2 


17.1 


17.6 


17.8 


18.5 


20 


8.8 


13.8 


16.3 


17.3 


17.8 


18.0 


18.9 


30 


9.0 


13.8 


16.5 


17.4 


17.9 


18.2 


19.2 


40 


9.6 


14.0 


16.6 


17.6 


18.1 


18.5 


19.6 


50 


10.0 


14.0 


16.8 


17.7 


18.3 


18.7 


20.0 


100 


10.0 


15.0 


17.0 


18.7 


19.3 


19.8 


21.9 



Table A2.1. Channel resistance (KQA2J for n- channel pulldovms 



T> 




Length 


enh-thresh 


5 


10 


20 


30 


40 


50 


100 


Width 


5 


16.0 


26.3 


31.5 


33.3 


34.1 


34.6 


35.6 


10 


16.6 


26.9 


32.1 


33.7 


34.6 


35.0 


35.9 


20 


17.6 


28.0 


32.9 


34.4 


35.1 


35.5 


35.5 


30 


18.6 


28.8 


33.5 


34.8 


35.4 


35.8 


36.4 


40 


19.2 


29.6 


33.8 


35.1 


35.7 


36.0 


36.6 


50 


20.0 


30.0 


34.3 


35.3 


35.9 


36.2 


36.8 


100 


22.0 


31.0 


35.5 


36.3 


36.8 


37.0 


37.6 



Table A2.2. Channel resistance (KQA2) for n-channel pulldovms with threshold drops 



R 




Length 


K enh-sf 


5 


10 


20 


30 


40 


50 


100 


Width 


5 


12.6 


22.8 


28.8 


31.2 


32.5 


33.5 


36.7 


10 


12.8 


23.1 


29.5 


32.2 


34.0 


35.4 


40.5 


20 


12.8 


23.6 


30.8 


34.3 


26.9 


39.0 


48.1 


30 


13.2 


24.3 


32.1 


36.5 


39.8 


42.7 


55.7 


40 


13.6 


24.8 


33.6 


38.5 


42.7 


46.4 


63.3 


50 


14.0 


25.5 


35.0 


40.7 


45.6 


50.1 


70.9 


100 


14.0 


28,0 


41.5 


51.3 


60.3 


68.6 


- 



Table A2.3. Channel resistance (KQ/D) for n-channel source-followers 
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p 




Length 


K dep-sf 


5 


10 


20 


30 


40 


50 


100 


Width 


5 


3.0 


4.3 


5.0 


5.3 


5.4 


5.6 


6.0 


10 


3.0 


4.3 


5.1 


5.4 


5.7 


5.9 


6.6 


20 


* 


4.4 


5.3 


5.8 


6.2 


6.4 


7.8 


30 


* 


4.5 


5.6 


6.2 


6.6 


7.0 


8.9 


40 


* 


4.8 


5.8 


6.5 


7.1 


7.6 


10.1 


50 


• 


5.0 


6.0 


6.8 


7.5 


8.2 


11.3 


100 


* 


* 


7.5 


8.7 


10.0 


11.2 


17.2 



Table A2.4. Channel resistance (KQ/D)for depletion source-followers 



r> 




Length 


K dep 


5 


10 


20 


30 


40 


50 


100 


Width 


5 


8.8 


15.1 


18.6 


19.9 


20.4 


20.8 


21.6 


10 


8.8 


15.2 


18.7 


19.9 


20.5 


20.8 


21.6 


20 


* 


15.2 


18.8 


20.0 


20.6 


21.0 


21.7 


30 


* 


15.3 


18.9 


20.1 


20.7 


21.0 


21.8 


40 


* 


15.5 


19.0 


20.1 


20.8 


21.1 


21.9 


50 


* 


15.5 


19.0 


20.3 


20.9 


21.3 


22.0 


100 


* 


* 


19.5 


20.7 


21.5 


21.8 


22.5 



Table A2.5. Channel resistance (KQ/D) for depletion pullups 
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APPENDIX THREE 



Approximation for Resistor Divider and Series Resistor 



As part of the incremental computation for the Thevenin equivalent of a network, it is necessary 
to approximate a resistor divider and series resistance (figure A3.1(a)) by a simple resistor divider 
(figure A3.1(b)). 



*W< m...i l VhJ 




[R,,R h ] 
VW — • 




[Q,.Q h ]> (Bjjy 



(a) initial network (b) approximation 

Figure A3.1. Initial resistor network and desired approximation 

As usual, each resistance is potentially a resistance interval. An exact choice for the modeling 
resistance is impossible (as will be shown below) so the goal of this appendix is the choice a suitable 
approximation. 

Consider a resistor divider with pullup resistance P and pulldown resistance Q. 
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K lhev 

■AAAr 



¥ thev 

(a) resistor divider (b) Thevenin equivalent 

Figure A3.2. Resistor divider and Thevenin equivalent 



The parameters of the Thevenin equivalent are 

and R thev = P | | Q 



Vthev - -y^Q 



(A3.1) 



which can be rearranged as linear equations relating R t hev and Vthev- 



Rthev = P Vthev and R t hev = <?(!- V t hev) 



(A3.2) 



If P and Q are intervals — P = [Pi, Ph] and Q = [Qj, Qh] — then the Thevenin parameters also are 
intervals: 



Vthev - I 



Qi 



Qh 



Qi + Ph' Qh + Pi 



] and R they =[Pi\\ QuPh 1 1 Qh ] 



(A3.3) 



If one plots the Thevenin parameter values (R t hev vs. VthevX as P and Q are varied independently 
over their respective intervals, equation A3.3 suggests the resulting area would be rectangular, but this 
is not the case, as is illustrated by the following figures. 



v thev 



-^v. 



thev 




thev 



l>v 



thev 



- t * V, 



thev 



(a) P, Q constant 



(b) P constant, Q varying 



(c) P varying, Q constant 



Figure A3 J. Thevenin plots as P and Q are varied one at a lime 



Equation A3.2 tells us that if, say, Q is held constant and P is varied, the plot is a straight line of slope 
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Q, which, if extended, would intersect the /?,/, cv axis at V,hev = 1 (see figure A3.3(c)). When both P 
and Q are varied (see figure A3.4), the plot produces a diamond-shaped quadrilateral, and not a 
rectangle. 



''thev 



PVh 



1<W 




p hllQh ' 



PjllQ! - 




ttrr**. 



thev 



Q,+P h Q h +P, 

Figure A3.4. Thevenin plot as P and Q are varied simultaneously 

Although the limits of R t he\ and V t hev are the ones shown in equation A3.3, certain combinations of 
Thevenin parameters permitted by the equation are clearly ruled out by the diagram above. 

If a series resistance R is now added, the resulting Thevenin plot is shown in the following 
figure. 



(W 



[Q,.Q h ] 




[R,,R h ] 
AAA/ — • 



R h + p hHQh 



•^thev 



Rl + PjllQ) 




■*-i* v . 



thev 



Ql+P h Qh+Pi 

Figure A3.5. Thevenin plot when series resistance is added 

The result is not a plot of a resistor divider at all. In order to approximate the circuit by a divider, a 
decision is needed concerning which information to preserve with the approximation. 

Since the approximation under development is used to calculate V t / Kv , it is important to preserve 
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information about the maximum and minimum of the circuit's voltage. This constraint fixes the right 
and left vertices of the diamond. The top and bottom vertices are constrained by the choice of 
resistance information to preserve; since it is better to overestimate than to underestimate resistances, 
the minimum value of R t f,ev is preserved. The resulting divider is shown graphically in the following 
figure. The voltage constraints are shown as dashed vertical lines; the resistance constraint as the 
circled vertex. 



[A,.A h ] 



PW 




»! + PiHQj h 



Figure A3.6. Thevenin plot showing approximating divider 




The values for Ai and B\ are determined by the second constraint and equation A3.2: 



Ri + (Qi 1 1 Pi) = ^ 



Qi 



Pi + Qi 



and Ri +(Qi\\ Pi) = 5/(1 - 



Qi 



Pi + Qi 



) (A3 A) 



This fixes the two lines that form the bottom half of the diamond. Next, the values of Ah and Bh are 
chosen so that the left and right vertices of the diamond have the same V t hev coordinates as in figure 
A3.5: 



Bi 



Qi 



A h + Bi P h + Qi 



and 



B h 



Qh 



Ai + B h P { + Q h 



(A3.5) 



Solving equations A3 .4 and A3. 5 for the parameters of the approximating divider yields: 



Ai = p, + R{ + R, 



—- Ah = Ph ■+ Ri~et + R iyr 
Qi Pi Qi 



Bi = Q, + Ri + R,<=L B h =Q h + Ri^- + R& 
Pi Qi Pi 



(A3.6) 
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Note that all resistances are greater than the minimum resistance of the scries resistor (/?/). A 
different choice of what resistance information to preserve (as was made in early versions of rsim), 
might cause A[ and B\ to be less than /?/, leading to pessimistic voltage predictions for some nlvlOS 
circuits. 
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